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Preface 


My primary goal in writing Understanding Analysis was to create an elemen- 
tary one-semester book that exposes students to the rich rewards inherent in 
taking a mathematically rigorous approach to the study of functions of a real 
variable. The aim of a course in real analysis should be to challenge and im- 
prove mathematical intuition rather than to verify it. There is a tendency, 
however, to center an introductory course too closely around the familiar the- 
orems of the standard calculus sequence. Producing a rigorous argument that 
polynomials are continuous is good evidence for a well-chosen definition of con- 
tinuity, but it is not the reason the subject was created and certainly not the 
reason it should be required study. By shifting the focus to topics where an 
untrained intuition is severely disadvantaged (e.g., rearrangements of infinite 
series, nowhere- differentiable continuous functions, Cantor sets), my intent is to 
bring an intellectual liveliness to this course by offering the beginning student 
access to some truly significant achievements of the subject. 


The Main Objectives 

Real analysis stands as a beacon of stability in the otherwise unpredictable evo- 
lution of the mathematics curriculum. Amid the various pedagogical revolutions 
in calculus, computing, statistics, and data analysis, nearly every undergradu- 
ate program continues to require at least one semester of real analysis. My 
own department once challenged this norm by creating a mathematical sciences 
track that allowed students to replace our two core proof-writing classes with 
electives in departments like physics and computer science. Within a few years, 
however, we concluded that the pieces did not hold together without a course in 
analysis. Analysis is, at once, a course in philosophy and applied mathematics. 
It is abstract and axiomatic in nature, but is engaged with the mathematics 
used by economists and engineers. 

How then do we teach a successful course to students with such diverse 
interests and expectations? Our desire to make analysis required study for wider 
audiences must be reconciled with the fact that many students find the subject 
quite challenging and even a bit intimidating. One unfortunate resolution of this 
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dilemma is to make the course easier by making it less interesting. The omitted 
material is inevitably what gives analysis its true flavor. A better solution is to 
find a way to make the more advanced topics accessible and worth the effort. 

I see three essential goals that a semester of real analysis should try to meet: 

1. Students need to be confronted with questions that expose the insufficiency 
of an informal understanding of the objects of calculus. The need for a 
more rigorous study should be carefully motivated. 

2. Having seen mainly intuitive or heuristic arguments, students need to learn 
what constitutes a rigorous mathematical proof and how to write one. 

3. Most importantly, there needs to be significant reward for the difficult 
work of firming up the logical structure of limits. Specifically, real anal- 
ysis should not be just an elaborate reworking of standard introductory 
calculus. Students should be exposed to the tantalizing complexities of 
the real line, to the subtleties of different flavors of convergence, and to 
the intellectual delights hidden in the paradoxes of the infinite. 

The philosophy of Understanding Analysis is to focus attention on questions 
that give analysis its inherent fascination. Does the Cantor set contain any 
irrational numbers? Can the set of points where a function is discontinuous 
be arbitrary? Are derivatives continuous? Are derivatives integrable? Is an 
infinitely differentiable function necessarily the limit of its Taylor series? In 
giving these topics center stage, the hard work of a rigorous study is justified 
by the fact that they are inaccessible without it. 


The Audience 

This book is an introductory text. The only prerequisite is a robust understand- 
ing of the results from single- variable calculus. The theorems of linear algebra 
are not needed, but the exposure to abstract arguments and proof writing that 
usually comes with this course would be a valuable asset. Complex numbers are 
never used. 

The proofs in Understanding Analysis are written with the beginning student 
firmly in mind. Brevity and other stylistic concerns are postponed in favor 
of including a significant level of detail. Most proofs come with a generous 
amount of discussion about the context of the argument. What should the 
proof entail? Which definitions are relevant? What is the overall strategy? 
Whenever there is a choice, efficiency is traded for an opportunity to reinforce 
some previously learned technique. Especially familiar or predictable arguments 
are often deferred to the exercises. 

The search for recurring ideas exists at the proof-writing level and also on 
the larger expository level. I have tried to give the course a narrative tone by 
picking up on the unifying themes of approximation and the transition from the 
finite to the infinite. Often when we ask a question in analysis the answer is 
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“sometimes.” Can the order of a double summation be exchanged? Is term-by- 
term differentiation of an infinite series allowed? By focusing on this recurring 
pattern, each successive topic builds on the intuition of the previous one. The 
questions seem more natural, and a coherent story emerges from what might 
otherwise appear as a long list of theorems and proofs. 

This book always emphasizes core ideas over generality, and it makes no 
effort to be a complete, deductive catalog of results. It is designed to capture the 
intellectual imagination. Those who become interested are then exceptionally 
well prepared for a second course starting from complex-valued functions on 
more general spaces, while those content with a single semester come away with 
a strong sense of the essence and purpose of real analysis. 


The Structure of the Book 

Although the book finds its way to some sophisticated results, the main body 
of each chapter consists of a lean and focused treatment of the core topics 
that make up the center of most courses in analysis. Fundamental results about 
completeness, compactness, sequential and functional limits, continuity, uniform 
convergence, differentiation, and integration are all incorporated. 

What is specific here is where the emphasis is placed. In the chapter on inte- 
gration, for instance, the exposition revolves around deciphering the relationship 
between continuity and the Riemann integral. Enough properties of the integral 
are obtained to justify a proof of the Fundamental Theorem of Calculus, but 
the theme of the chapter is the pursuit of a characterization of integrable func- 
tions in terms of continuity. Whether or not Lebesgue’s measure-zero criterion 
is treated, framing the material in this way is still valuable because it is the 
questions that are important. Mathematics is not a static discipline. Students 
should be aware of the historical reasons for the creation of the mathematics 
they are learning and by extension realize that there is no last word on the 
subject. In the case of integration, this point is made explicitly by including 
some relatively modern developments on the generalized Riemann integral in 
the additional topics of the last chapter. 

The structure of the chapters has the following distinctive features. 

Discussion Sections: Each chapter begins with the discussion of some mo- 
tivating examples and open questions. The tone in these discussions is inten- 
tionally informal, and full use is made of familiar functions and results from 
calculus. The idea is to freely explore the terrain, providing context for the 
upcoming definitions and theorems. After these exploratory introductions, the 
tone of the writing changes, and the treatment becomes rigorously tight but 
still not overly formal. With the questions in place, the need for the ensuing 
development of the material is well motivated and the payoff is in sight. 

Project Sections: The penultimate section of each chapter (the final section is 
a short epilogue) is written with the exercises incorporated into the exposition. 
Proofs are outlined but not completed, and additional exercises are included to 
elucidate the material being discussed. The sections are written as self-guided 
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tutorials, but they can also be the subject of lectures. I typically use them in 
place of a final examination, and they work especially well as collaborative as- 
signments that can culminate in a class presentation. The body of each chapter 
contains the necessary tools, so there is some satisfaction in letting the students 
use their newly acquired skills to ferret out for themselves answers to questions 
that have been driving the exposition. 


Building a Course 

Although this book was originally designed for a 12-14-week semester, it has 
been used successfully in any number of formats including independent study. 
The dependence of the sections follows the natural ordering, but there is some 
flexibility as to what can be treated and omitted. 

• The introductory discussions to each chapter can be the subject of lecture, 
assigned as reading, omitted, or substituted with something preferable. 
There are no theorems proved here that show up later in the text. I do 
develop some important examples in these introductions (the Cantor set, 
Dirichlet’s nowhere-continuous function) that probably need to find their 
way into discussions at some point. 

• Chapter 3, Basic Topology of R, is much longer than it needs to be. All 
that is required by the ensuing chapters are fundamental results about 
open and closed sets and a thorough understanding of sequential com- 
pactness. The characterization of compactness using open covers as well 
as the section on perfect and connected sets are included for their own in- 
trinsic interest. They are not, however, crucial to any future proofs. The 
one exception to this is a presentation of the Intermediate Value Theorem 
(IVT) as a special case of the preservation of connected sets by continu- 
ous functions. To keep connectedness truly optional, I have included two 
direct proofs of IVT based on completeness results from Chapter 1. 

• All the project sections (1.6, 2.8, 3.5, 4.6, 5.4, 6.7, 7.6, 8. 1-8.6) are optional 
in the sense that no results in later chapters depend on material in these 
sections. The six topics covered in Chapter 8 are also written in this 
tutorial-style format, where the exercises make up a significant part of the 
development. The only one of these sections that might benefit from a 
lecture is the unit on Fourier series, which is a bit longer than the others. 


Changes in the Second Edition 

In light of the encouraging feedback — especially from students — I decided not 
to attempt any major alterations to the central narrative of the text as it was 
set out in the original edition. Some longer sections have been edited down, 
or in one case split in two, and the unit on Taylor series is now part of the 
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core material of Chapter 6 instead of being relegated to the closing project 
section. In contrast to the main body of the book, significant effort has gone 
into revising the exercises and projects. There are roughly 150 new exercises in 
this edition alongside 200 or so of what I feel are the most effective problems 
from the first edition. Some of these introduce new ideas not covered in the 
chapters (e.g., Euler’s constant, infinite products, inverse functions), but the 
majority are designed to kindle debates about the major ideas under discussion 
in what I hope are engaging ways. There are ample propositions to prove but 
also a good supply of Moore-method type exercises that require assessing the 
validity of various conjectures, deciphering invented definitions, or searching for 
examples that may not exist. 

The introductory discussion to Chapter 6 is new and tells the story of how 
Euler’s deft and audacious manipulations of power series led to a computation 
of JW/n 2 . Providing a proper proof for Euler’s sum is the topic of one of 
three new project sections. The other two are a treatment of the Weierstrass 
Approximation Theorem and an exploration of how to best extend the domain of 
the factorial function to all of R. Each of these three topics represents a seminal 
achievement in the history of analysis, but my decision to include them has as 
much to do with the associated ideas that accompany the main proofs. For the 
Weierstrass Approximation Theorem, the particular argument that I chose relies 
on Taylor series and a deep understanding of uniform convergence, making it 
an ideal project to conclude Chapter 6. The journey to a proper definition of x\ 
allowed me to include a short unit on improper integrals and a proof of Leibniz’s 
rule for differentiating under the integral sign. The accompanying topics for the 
project on Euler’s sum are an analysis of the integral remainder formula for 
Taylor series and a proof of Wallis’s famous product formula for 7 r. Yes these 
are challenging arguments but they are also beautiful ideas. Returning to the 
thesis of this text, it is my conviction that encounters with results like these 
make the task of learning analysis less daunting and more meaningful. They 
make the epsilons matter. 
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Chapter 1 


The Real Numbers 


1.1 Discussion: The Irrationality of V2 

Toward the end of his distinguished career, the renowned British mathematician 
G.H. Hardy eloquently laid out a justification for a life of studying mathematics 
in A Mathematician’s Apology , an essay first published in 1940. At the center 
of Hardy’s defense is the thesis that mathematics is an aesthetic discipline. For 
Hardy, the applied mathematics of engineers and economists held little charm. 
“Real mathematics,” as he referred to it, “must be justified as art if it can be 
justified at all.” 

To help make his point, Hardy includes two theorems from classical Greek 
mathematics, which, in his opinion, possess an elusive kind of beauty that, 
although difficult to define, is easy to recognize. The first of these results is 
Euclid’s proof that there are an infinite number of prime numbers. The second 
result is the discovery, attributed to the school of Pythagoras from around 500 
B.C., that \[2 is irrational. It is this second theorem that demands our attention. 
(A course in number theory would focus on the first.) The argument uses only 
arithmetic, but its depth and importance cannot be overstated. As Hardy says, 
“[It] is a ‘simple’ theorem, simple both in idea and execution, but there is no 
doubt at all about [it being] of the highest class. [It] is as fresh and significant as 
when it was discovered — two thousand years have not written a wrinkle on [it].” 

Theorem 1.1.1. There is no rational number whose square is 2. 

Proof. A rational number is any number that can be expressed in the form p/q , 
where p and q are integers. Thus, what the theorem asserts is that no matter 
how p and q are chosen, it is never the case that (p/q) 2 = 2. The line of attack 
is indirect, using a type of argument referred to as a proof by contradiction. 
The idea is to assume that there is a rational number whose square is 2 and 
then proceed along logical lines until we reach a conclusion that is unacceptable. 


@ Springer Science+Business Media New York 2015 
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At this point, we will be forced to retrace our steps and reject the erroneous 
assumption that some rational number squared is equal to 2. In short, we will 
prove that the theorem is true by demonstrating that it cannot be false. 

And so assume, for contradiction, that there exist integers p and q satisfying 



We may also assume that p and q have no common factor, because, if they had 
one, we could simply cancel it out and rewrite the fraction in lowest terms. Now, 
equation (1) implies 

( 2 ) p 2 = 2 q 2 . 

From this, we can see that the integer p 2 is an even number (it is divisible by 2), 
and hence p must be even as well because the square of an odd number is odd. 
This allows us to write p = 2r, where r is also an integer. If we substitute 2 r 
for p in equation (2), then a little algebra yields the relationship 

2 r 2 = q 2 . 

But now the absurdity is at hand. This last equation implies that q 2 is even, 
and hence q must also be even. Thus, we have shown that p and q are both 
even (i.e., divisible by 2) when they were originally assumed to have no common 
factor. From this logical impasse, we can only conclude that equation (1) cannot 
hold for any integers p and q, and thus the theorem is proved. □ 

A component of Hardy’s definition of beauty in a mathematical theorem 
is that the result have lasting and serious implications for a network of other 
mathematical ideas. In this case, the ideas under assault were the Greeks’ under- 
standing of the relationship between geometric length and arithmetic number. 
Prior to the preceding discovery, it was an assumed and commonly used fact 
that, given two line segments AB and CD, it would always be possible to find 
a third line segment whose length divides evenly into the first two. In modern 
terminology, this is equivalent to asserting that the length of CD is a rational 
multiple of the length of AB. Looking at the diagonal of a unit square (Fig. 1.1), 
it now followed (using the Pythagorean Theorem) that this was not always the 
case. Because the Pythagoreans implicitly interpreted number to mean rational 
number, they were forced to accept that number was a strictly weaker notion 
than length. 

Rather than abandoning arithmetic in favor of geometry (as the Greeks seem 
to have done), our resolution to this limitation is to strengthen the concept of 
number by moving from the rational numbers to a larger number system. From 
a modern point of view, this should seem like a familiar and somewhat natural 
phenomenon. We begin with the natural numbers 


N = {1,2, 3, 4, 5,...}. 
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Figure 1.1: y/2 EXISTS AS A GEOMETRIC LENGTH. 

The influential German mathematician Leopold Kronecker (1823-1891) once 
asserted that “The natural numbers are the work of God. All of the rest is 
the work of mankind.” Debating the validity of this claim is an interesting 
conversation for another time. For the moment, it at least provides us with 
a place to start. If we restrict our attention to the natural numbers N, then 
we can perform addition perfectly well, but we must extend our system to the 
integers 

Z = {...,-3, -2, -1,0, 1,2,3,...} 

if we want to have an additive identity (zero) and the additive inverses necessary 
to define subtraction. The next issue is multiplication and division. The number 
1 acts as the multiplicative identity, but in order to define division we need to 
have multiplicative inverses. Thus, we extend our system again to the rational 
numbers 

Q = < all fractions - where p and q are integers with q / 0 

l Q 

Taken together, the properties of Q discussed in the previous paragraph 
essentially make up the definition of what is called a field. More formally stated, 
a field is any set where addition and multiplication are well-defined operations 
that are commutative, associative, and obey the familiar distributive property 
a(b + c) = ab-\-ac. There must be an additive identity, and every element must 
have an additive inverse. Finally, there must be a multiplicative identity, and 
multiplicative inverses must exist for all nonzero elements of the field. Neither 
Z nor N is a field. The finite set {0,1, 2, 3, 4} is a field when addition and 
multiplication are computed modulo 5. This is not immediately obvious but 
makes an interesting exercise. 

The set Q also has a natural order defined on it. Given any two rational 
numbers r and s, exactly one of the following is true: 

r < s, r = s, 



or r > s. 


4 


Chapter 1. The Real Numbers 


V2 


1.414 

Figure 1.2: Approximating y /2 WITH rational numbers. 

This ordering is transitive in the sense that if r < s and s < £, then r < £, so 
we are conveniently led to a mental picture of the rational numbers as being 
laid out from left to right along a number line. Unlike Z, there are no intervals 
of empty space. Given any two rational numbers r < s, the rational number 
(r + s)/2 sits halfway in between, implying that the rational numbers are densely 
nestled together. 

With the field properties of Q allowing us to safely carry out the algebraic 
operations of addition, subtraction, multiplication, and division, let’s remind 
ourselves just what it is that Q is lacking. By Theorem 1.1.1, it is apparent 
that we cannot always take square roots. The problem, however, is actually 
more fundamental than this. Using only rational numbers, it is possible to 
approximate y/2 quite well (Fig. 1.2). For instance, 1.414 2 = 1.999396. By 
adding more decimal places to our approximation, we can get even closer to 
a value for y/2, but, even so, we are now well aware that there is a “hole” in 
the rational number line where y/2 ought to be. Of course, there are quite a 
few other holes — at and y/E, for example. Returning to the dilemma of the 
ancient Greek mathematicians, if we want every length along the number line to 
correspond to an actual number, then another extension to our number system 
is in order. Thus, to the chain N C Z C Q we append the real numbers R. 

The question of how to actually construct R from Q is rather complicated 
business. It is discussed in Section 1.3, and then again in more detail in Sec- 
tion 8.6. For the moment, it is not too inaccurate to say that R is obtained by 
filling in the gaps in Q. Wherever there is a hole, a new irrational number is 
defined and placed into the ordering that already exists on Q. The real numbers 
are then the union of these irrational numbers together with the more familiar 
rational ones. What properties does the set of irrational numbers have? How 
do the sets of rational and irrational numbers fit together? Is there a kind of 
symmetry between the rationals and the irrationals, or is there some sense in 
which we can argue that one type of real number is more common than the 
other? The one method we have seen so far for generating examples of irra- 
tional numbers is through square roots. Not too surprisingly, other roots such 
as S/2 or y/E are most often irrational. Can all irrational numbers be expressed 
as algebraic combinations of nth roots and rational numbers, or are there still 
other irrational numbers beyond those of this form? 
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1.2 Some Preliminaries 

The vocabulary necessary for the ensuing development comes from set theory 
and the theory of functions. This should be familiar territory, but a brief review 
of the terminology is probably a good idea, if only to establish some agreed-upon 
notation. 

Sets 

Intuitively speaking, a set is any collection of objects. These objects are referred 
to as the elements of the set. For our purposes, the sets in question will most 
often be sets of real numbers, although we will also encounter sets of functions 
and, on a few occasions, sets whose elements are other sets. 

Given a set A, we write x G A if x (whatever it may be) is an element of A. 
If x is not an element of A , then we write x A. Given two sets A and B , the 
union is written A U B and is defined by asserting that 

x G A U B provided that x G A or x G B (or potentially both). 

The intersection An B is the set defined by the rule 

x G A D B provided x G A and x G B. 

Example 1.2.1. (i) There are many acceptable ways to assert the contents 

of a set. In the previous section, the set of natural numbers was defined 
by listing the elements: N = {1, 2, 3, . . .}. 

(ii) Sets can also be described in words. For instance, we can define the set E 
to be the collection of even natural numbers. 

(iii) Sometimes it is more efficient to provide a kind of rule or algorithm for 
determining the elements of a set. As an example, let 

S = {r G Q : r 2 < 2}. 

Read aloud, the definition of S says, “Let S be the set of all rational 
numbers whose squares are less than 2.” It follows that 1 G S', 4/3 G S, 
but 3/2 </ S because 9/4 > 2. 

Using the previously defined sets to illustrate the operations of intersection 
and union, we observe that 

N U £ = N, N n E = E, N n S' = {1}, and E n S = 0. 

The set 0 is called the empty set and is understood to be the set that con- 
tains no elements. An equivalent statement would be to say that E and S are 
disjoint. 
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A word about the equality of two sets is in order (since we have just used the 
notion). The inclusion relationship A C B or B D A is used to indicate that 
every element of A is also an element of B. In this case, we say A is a subset of 
L>, or B contains A. To assert that A — B means that A C B and B C A. Put 
another way, A and B have exactly the same elements. 

Quite frequently in the upcoming chapters, we will want to apply the union 
and intersection operations to infinite collections of sets. 

Example 1.2.2. Let 


V = N = {1, 2, 3, . . 

A 2 = {2, 3, 4, . . .}, 

A 3 = {3,4,5,...}, 

and, in general, for each n E N, define the set 

A n = {n, n 1 , n T 2 , . . .}. 
The result is a nested chain of sets 


A\ T A 2 T A3 T A4 B • • • , 

where each successive set is a subset of all the previous ones. Notationally, 

00 

[J A n , or Ai U A 2 U A 3 U • • • 

n= 1 nCN 

are all equivalent ways to indicate the set whose elements consist of any element 
that appears in at least one particular A n . Because of the nested property of 
this particular collection of sets, it is not too hard to see that 

00 

A n = A\. 

71=1 

The notion of intersection has the same kind of natural extension to infinite 
collections of sets. For this example, we have 

00 

f) A n = 0- 

n — 1 

Let’s be sure we understand why this is the case. Suppose we had some natural 
number m that we thought might actually satisfy m E fl^Li A n - What this 
would mean is that m E A n for every A n in our collection of sets. Because m 
is not an element of A m +i, no such m exists and the intersection is empty. 

As mentioned, most of the sets we encounter will be sets of real numbers. 
Given ACR, the complement of A, written A c , refers to the set of all elements 
of R not in A. Thus, for ACR, 


A c = {x E R : x ^ A}. 
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A few times in our work to come, we will refer to De Morgan’s Laws, which 
state that 

(AnB) c = A C UB C and (A U B) c = A c n B c . 

Proofs of these statements are discussed in Exercise 1.2.5. 

Admittedly, there is something imprecise about the definition of set pre- 
sented at the beginning of this discussion. The defining sentence begins with 
the phrase “Intuitively speaking,” which might seem an odd way to embark on a 
course of study that purportedly intends to supply a rigorous foundation for the 
theory of functions of a real variable. In some sense, however, this is unavoid- 
able. Each repair of one level of the foundation reveals something below it in 
need of attention. The theory of sets has been subjected to intense scrutiny over 
the past century precisely because so much of modern mathematics rests on this 
foundation. But such a study is really only advisable once it is understood why 
our naive impression about the behavior of sets is insufficient. For the direction 
in which we are heading, this will not happen, although an indication of some 
potential pitfalls is given in Section 1.7. 

Functions 

Definition 1.2.3. Given two sets A and B , a function from A to B is a rule or 
mapping that takes each element x G A and associates with it a single element 
of B. In this case, we write / : A B. Given an element x G A, the expression 
f(x) is used to represent the element of B associated with x by /. The set A is 
called the domain of /. The range of / is not necessarily equal to B but refers 
to the subset of B given by {y G B : y = f(x) for some x G A}- 

This definition of function is more or less the one proposed by Peter Lejeune 
Dirichlet (1805-1859) in the 1830s. Dirichlet was a German mathematician who 
was one of the leaders in the development of the rigorous approach to functions 
that we are about to undertake. His main motivation was to unravel the issues 
surrounding the convergence of Fourier series. Dirichlet’s contributions figure 
prominently in Section 8.5, where an introduction to Fourier series is presented, 
but we will also encounter his name in several earlier chapters along the way. 
What is important at the moment is that we see how Dirichlet’s definition 
of function liberates the term from its interpretation as a type of “formula.” 
In the years leading up to Dirichlet’s time, the term “function” was generally 
understood to refer to algebraic entities such as f{x) = x 2 + l or g{x) = y/x 4 + 4. 
Definition 1.2.3 allows for a much broader range of possibilities. 

Example 1.2.4. In 1829, Dirichlet proposed the unruly function 

{x) = { 1 if * e Q 
9{ ’ \ 0 if x <£ Q. 

The domain of g is all of R, and the range is the set {0, 1}. There is no single 
formula for g in the usual sense, and it is quite difficult to graph this function 
(see Section 4.1 for a rough attempt), but it certainly qualifies as a function 
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according to the criterion in Definition 1.2.3. As we study the theoretical nature 
of continuous, differentiable, or integrable functions, examples such as this one 
will provide us with an invaluable testing ground for the many conjectures we 
encounter. 


Example 1.2.5 (Triangle Inequality). The absolute value function is so 
important that it merits the special notation \x\ in place of the usual f(x ) or 
g(x). It is defined for every real number via the piecewise definition 


( x if x > 0 
| —x if x < 0. 


With respect to multiplication and division, the absolute value function satisfies 


(i) | ab 


a 


b and 


(ii) \a + b\ < \a\ + | b 


for all choices of a and b. Verifying these properties (Exercise 1.2.6) is just a 
matter of examining the different cases that arise when a, 6, and a+b are positive 
and negative. Property (ii) is called the triangle inequality. This innocuous 
looking inequality turns out to be fantastically important and will be frequently 
employed in the following way. Given three real numbers a, 6, and c, we certainly 
have 


a — b 


(a — c) + (c — b) 


By the triangle inequality, 



c) T (c — b)\ < 


a 


c 


+ 


c 


so we get 


(i) 


a — b < 


a — c 


+ c — b 


Now, the expression \a — b\ is equal to \b — a\ and is best understood as the dis- 
tance between the points a and b on the number line. With this interpretation, 
equation (1) makes the plausible statement that the distance from a to b is less 
than or equal to the distance from a to c plus the distance from c to b. Pre- 
tending for a moment that these are points in the plane (instead of on the real 
line), it should be evident why this is referred to as the “triangle inequality.” 


Logic and Proofs 

Writing rigorous mathematical proofs is a skill best learned by doing, and there 
is plenty of on-the-job training just ahead. As Hardy indicates, there is an artis- 
tic quality to mathematics of this type, which may or may not come easily, but 
that is not to say that anything especially mysterious is happening. A proof is 
an essay of sorts. It is a set of carefully crafted directions, which, when followed, 
should leave the reader absolutely convinced of the truth of the proposition in 
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question. To achieve this, the steps in a proof must follow logically from pre- 
vious steps or be justified by some other agreed-upon set of facts. In addition 
to being valid, these steps must also fit coherently together to form a cogent 
argument. Mathematics has a specialized vocabulary, to be sure, but that does 
not exempt a good proof from being written in grammatically correct English. 

The one proof we have seen at this point (to Theorem 1.1.1) uses an indirect 
strategy called proof by contradiction. This powerful technique will be employed 
a number of times in our upcoming work. Nevertheless, most proofs are direct. 
(It also bears mentioning that using an indirect proof when a direct proof is 
available is generally considered bad form.) A direct proof begins from some 
valid statement, most often taken from the theorem’s hypothesis, and then pro- 
ceeds through rigorously logical deductions to a demonstration of the theorem’s 
conclusion. As we saw in Theorem 1.1.1, an indirect proof always begins by 
negating what it is we would like to prove. This is not always as easy to do as it 
may sound. The argument then proceeds until (hopefully) a logical contradic- 
tion with some other accepted fact is uncovered. Many times, this accepted fact 
is part of the hypothesis of the theorem. When the contradiction is with the 
theorem’s hypothesis, we technically have what is called a contrapositive proof. 

The next proposition illustrates a number of the issues just discussed and 
introduces a few more. 


Theorem 1.2.6. Two real numbers a and b are equal if and only if for every 
real number e > 0 it follows that \a — b\ < e. 

Proof. There are two key phrases in the statement of this proposition that 
warrant special attention. One is “for every,” which will be addressed in a 
moment. The other is “if and only if.” To say “if and only if” in mathematics 
is an economical way of stating that the proposition is true in two directions. 
In the forward direction, we must prove the statement: 

(=>) If a = b, then for every real number e > 0 it follows that \a — b\ < e. 

We must also prove the converse statement: 


(<=) If for every real number e > 0 it follows that 
have a = b. 


a 


b <e, then we must 


For the proof of the first statement, there is really not much to say. If a = 6, 
then \a — b\ =0, and so certainly \a — b\ < e no matter what e > 0 is chosen. 

For the second statement, we give a proof by contradiction. The conclusion 
of the proposition in this direction states that a = 6, so we assume that a ^ b. 
Heading off in search of a contradiction brings us to a consideration of the phrase 
“for every e > 0.” Some equivalent ways to state the hypothesis would be to 
say that “for all possible choices of e > 0” or “no matter how e > 0 is selected, 


it is always the case that | a — b 
the moment), the choice of 


< e.” But assuming a b (as we are doing at 


> 0 


eo = 


a — b 
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poses a serious problem. We are assuming that \a — b\ < e is true for every 
e > 0, so this must certainly be true of the particular eo just defined. However, 
the statements 


a 


b < e o and 


a 


eo 


cannot both be true. This contradiction means that our initial assumption that 
a / b is unacceptable. Therefore, a = 6 , and the indirect proof is complete. □ 


One of the most fundamental skills required for reading and writing analysis 
proofs is the ability to confidently manipulate the quantifying phrases “for all” 
and “there exists.” Significantly more attention will be given to this issue in 
many upcoming discussions. 


Induction 

One final trick of the trade, which will arise with some frequency, is the use of 
induction arguments. Induction is used in conjunction with the natural numbers 
N (or sometimes with the set N U {0}). The fundamental principle behind 
induction is that if S is some subset of N with the property that 

(i) S contains 1 and 

(ii) whenever S contains a natural number n, it also contains n + 1 , 

then it must be that S = N. As the next example illustrates, this principle can 
be used to define sequences of objects as well as to prove facts about them. 

Example 1.2.7. Let x\ — 1, and for each n E N define 

*^n+l — (l/2)^n T 1. 

Using this rule, we can compute x 2 = (1/2) (1) + 1 = 3/2, X 3 = 7/4, and it is 
immediately apparent how this leads to a definition of x n for all n E N. 

The sequence just defined appears at the outset to be increasing. For the 
terms computed, we have x\ < x^ < £ 3 . Let’s use induction to prove that this 
trend continues; that is, let’s show 

(2) X n ^ X n -\-\ 


for all values of n E N. 

For n = 1, x\ — 1 and x 2 = 3/2, so that x\ < x 2 is clear. Now, we want to 
show that 


if we have x n < x n+ i, then it follows that x n+ i < x n+ 2 - 

Think of S as the set of natural numbers for which the claim in equation ( 2 ) 
is true. We have shown that 1 £ S. We are now interested in showing that if 
n £ S, then n+1 £ S' as well. Starting from the induction hypothesis x n < £ n +i, 
we can multiply across the inequality by 1/2 and add 1 to get 
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which is precisely the desired conclusion x n +i < x n+ 2* By induction, the claim 
is proved for all n E N. 

Any discussion about why induction is a valid argumentative technique im- 
mediately opens up a box of questions about how we understand the natural 
numbers. Earlier, in Section 1 . 1 , we avoided this issue by referencing Kro- 
necker’s famous comment that the natural numbers are somehow divinely given. 
Although we will not improve on this explanation here, it should be pointed out 
that a more atheistic and mathematically satisfying approach to N is possible 
from the point of view of axiomatic set theory. This brings us back to a recurring 
theme of this chapter. Pedagogically speaking, the foundations of mathematics 
are best learned and appreciated in a kind of reverse order. A rigorous study of 
the natural numbers and the theory of sets is certainly recommended, but only 
after we have an understanding of the subtleties of the real number system. It 
is this latter topic that is the business of real analysis. 

Exercises 

Exercise 1.2.1. (a) Prove that a/3 is irrational. Does a similar argument 

work to show y/6 is irrational? 

(b) Where does the proof of Theorem 1 . 1.1 break down if we try to use it to 
prove a/4 is irrational? 

Exercise 1.2.2. Show that there is no rational number r satisfying 2 r = 3 . 

Exercise 1.2.3. Decide which of the following represent true statements about 
the nature of sets. For any that are false, provide a specific example where the 
statement in question does not hold. 

(a) If A 1 D A2 D A3 D A4 • • • are all sets containing an infinite number of 
elements, then the intersection Pl^Li A n is infinite as well. 

(b) If A 1 D A 2 A A3 D A4 • • • are all finite, nonempty sets of real numbers, 
then the intersection H^Li A n is finite and nonempty. 

(c) A n (B U C) = (A n B) U C. 

(d) A n (B n c) = (A n B) n C. 

(e) An(BuC) = (Anfi)u(An C). 

Exercise 1.2.4. Produce an infinite collection of sets Ai, A2, A3, . . . with the 
property that every Ai has an infinite number of elements, Ai H A j =0 for all 
i / j, and |JSi A i = N - 

Exercise 1.2.5 (De Morgan’s Laws). Let A and B be subsets of R. 


(a) If x E (A n L>) c , explain why x E A c U 5 C . This shows that ( Ap\B) c C 
A c U B c . 
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(b) Prove the reverse inclusion (A D B) c D i c U 5 C , and conclude that 
(AnB) c = A c U B c . 

(c) Show (AU B) c = A c n B c by demonstrating inclusion both ways. 

Exercise 1.2.6. (a) Verify the triangle inequality in the special case where 

a and b have the same sign. 

(b) Find an efficient proof for all the cases at once by first demonstrating 
(a + b ) 2 < (|a| + \b\) 2 . 

(c) Prove \a — b\ < \a — c\ + \c — d\ + \d — b\ for all a, 6 , c, and d. 

(d) Prove \\a\ — |b|| < | a — b |. (The unremarkable identity a = a — b + b may 

be useful.) 

Exercise 1.2.7. Given a function / and a subset A of its domain, let f(A) 
represent the range of / over the set A; that is, f(A ) = {f(x) : x G A}. 

(a) Let f(x) = x 2 . If A = [0,2] (the closed interval {x G R : 0 < x < 2}) 

and B = [1,4], find f(A) and f(B). Does f(AnB) = f(A) nf(B) in this 

case? Does f(A U B) = f(A) U f(B)7 

(b) Find two sets A and B for which f(A D B) ^ f(A) D f(B). 

(c) Show that, for an arbitrary function g : R R, it is always true that 
g(A D B) C g(A) D g(B) for all sets A, B C R. 

(d) Form and prove a conjecture about the relationship between g(AuB) and 
g(A) U g(B) for an arbitrary function g. 

Exercise 1.2.8. Here are two important definitions related to a function / : 
A B. The function / is one-to-one (1-1) if aq ^ a 2 in A implies that /(cq) 7 ^ 
f(a 2 ) in B. The function / is onto if, given any b E B, it is possible to find an 
element a E A for which /(a) = b. 

Give an example of each or state that the request is impossible: 

(a) / : N N that is 1-1 but not onto. 

(b) / : N N that is onto but not 1-1. 

(c) / : N Z that is 1-1 and onto. 

Exercise 1.2.9. Given a function / : D — > FI and a subset B C R, let f~ 1 (B) 

be the set of all points from the domain D that get mapped into B; that is, 
r 1 (B) = {xe D : f(x) G B}. This set is called the preimage of B. 

(a) Let f(x) = x 2 . If A is the closed interval [0, 4] and B is the closed interval 
[—1, 1], find f~ 1 (A) and f~ 1 (B). Does f~ 1 (A D B) = f~ 1 (A) D f~ 1 (B) 
in this case? Does f~\A U B) = f~\A) U 
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(b) The good behavior of preimages demonstrated in (a) is completely general. 
Show that for an arbitrary function g : R R, it is always true that 
g~ x {A n B) = g~ 1 (A) n g~ l {B) and g~ x {A U B) = g~ 1 (A) U g~ 1 {B) for 
all sets A,B C R. 

Exercise 1.2.10. Decide which of the following are true statements. Provide a 
short justification for those that are valid and a counterexample for those that 
are not: 

(a) Two real numbers satisfy a < b if and only if a < b + e for every e > 0. 

(b) Two real numbers satisfy a < b if a < b + e for every e > 0. 

(c) Two real numbers satisfy a < b if and only if a < b + e for every e > 0. 

Exercise 1.2.11. Form the logical negation of each claim. One trivial way to 

do this is to simply add “It is not the case that. . . ” in front of each assertion. 
To make this interesting, fashion the negation into a positive statement that 
avoids using the word “not” altogether. In each case, make an intuitive guess 
as to whether the claim or its negation is the true statement. 

(a) For all real numbers satisfying a < 6, there exists an n E N such that 
a + 1/n < b. 

(b) There exists a real number x > 0 such that x < 1/n for all n E N. 

(c) Between every two distinct real numbers there is a rational number. 
Exercise 1.2.12. Let yi = 6, and for each n E N define y n +i = (2 y n — 6)/3. 

(a) Use induction to prove that the sequence satisfies y n > — 6 for all n E N. 

(b) Use another induction argument to show the sequence ( 2/1 5 2/2 ? 2/3 ? • • • ) 
decreasing. 

Exercise 1.2.13. For this exercise, assume Exercise 1.2.5 has been successfully 
completed. 

(a) Show how induction can be used to conclude that 

(iiui 2 u---u A n ) c = A\ n yh, n • • • n A c n 

for any finite n E N. 

(b) It is tempting to appeal to induction to conclude 

( 00 \ c 00 

im<) =rvf> 

2 = 1 / 2=1 

but induction does not apply here. Induction is used to prove that a 
particular statement holds for every value of n E N, but this does not 
imply the validity of the infinite case. To illustrate this point, find an 
example of a collection of sets F>i, F> 2 , £> 3 , . . . where fjlLi B% 7 ^ 0 is true 
for every n E N, but Bi 7 ^ 0 fails. 
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(c) Nevertheless, the infinite version of De Morgan’s Law stated in (b) is a 
valid statement. Provide a proof that does not use induction. 

1.3 The Axiom of Completeness 

What exactly is a real number? In Section 1.1, we got as far as saying that 
the set R of real numbers is an extension of the rational numbers Q in which 
there are no holes or gaps. We want every length along the number line — such 
as y/2 — to correspond to a real number and vice versa. 

We are going to improve on this definition, but as we do so, it is important 
to keep in mind our earlier acknowledgment that whatever precise statements 
we formulate will necessarily rest on other unproven assumptions or undefined 
terms. At some point, we must draw a line and confess that this is what we have 
decided to accept as a reasonable place to start. Naturally, there is some debate 
about where this line should be drawn. One way to view the mathematics of 
the 19th and 20th centuries is as a stalwart attempt to move this line further 
and further back toward some unshakable foundation. The majority of the 
material covered in this book is attributable to the mathematicians working in 
the early and middle parts of the 1800s. Augustin Louis Cauchy (1789-1857), 
Bernhard Bolzano (1781-1848), Niels Henrik Abel (1802-1829), Peter Lejeune 
Dirichlet, Karl Weierstrass (1815-1897), and Bernhard Riemann (1826-1866) all 
figure prominently in the discovery of the theorems that follow. But here is the 
interesting point. Nearly all of this work was done using intuitive assumptions 
about the nature of R quite similar to our own informal understanding at this 
point. Eventually, enough scrutiny was directed at the detailed structure of R 
so that, in the 1870s, a handful of ways to rigorously construct R from Q were 
proposed. 

Following this historical model, our own rigorous construction of R from Q 
is postponed until Section 8.6. By this point, the need for such a construction 
will be more justified and easier to appreciate. In the meantime, we have many 
proofs to write, so it is important to lay down, as explicitly as possible, the 
assumptions that we intend to make about the real numbers. 

An Initial Definition for R 

First, R is a set containing Q. The operations of addition and multiplication 
on Q extend to all of R in such a way that every element of R has an additive 
inverse and every nonzero element of R has a multiplicative inverse. Echoing 
the discussion in Section 1.1, we assume R is a field , meaning that addition 
and multiplication of real numbers are commutative, associative, and the dis- 
tributive property holds. This allows us to perform all of the standard algebraic 
manipulations that are second nature to us. We also assume that the familiar 
properties of the ordering on Q extend to all of R. Thus, for example, such 
deductions as “If a < b and c > 0, then ac < be ” will be carried out freely 
without much comment. To summarize the situation in the official terminology 
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upper bounds 


Figure 1.3: Definition of sup A and inf A. 


of the subject, we assume that R is an ordered field , which contains Q as a 
subfield. (A rigorous definition of “ordered field” is presented in Section 8.6.) 

This brings us to the final, and most distinctive, assumption about the real 
number system. We must find some way to clearly articulate what we mean by 
insisting that R does not contain the gaps that permeate Q. Because this is the 
defining difference between the rational numbers and the real numbers, we will 
be excessively precise about how we phrase this assumption, hereafter referred 
to as the Axiom of Completeness. 

Axiom of Completeness. Every nonempty set of real numbers that is bounded 
above has a least upper bound. 

Now, what exactly does this mean? 

Least Upper Bounds and Greatest Lower Bounds 

Let’s first state the relevant definitions, and then look at some examples. 

Definition 1.3.1. A set A C R is bounded above if there exists a number b e R 
such that a < b for all a e A. The number b is called an upper bound for A. 

Similarly, the set A is bounded below if there exists a lower bound l e R 
satisfying l < a for every a E A. 


Definition 1.3.2. A real number s is the least upper bound for a set A C R if 
it meets the following two criteria: 

(i) s is an upper bound for A; 

(ii) if b is any upper bound for A, then s < b. 

The least upper bound is also frequently called the supremum of the set A. 
Although the notation s = lubA is sometimes used, we will always write s = 
sup A for the least upper bound. 

The greatest lower bound or infimum for A is defined in a similar way 
(Exercise 1.3.1) and is denoted by inf A (Fig. 1.3). 

Although a set can have a host of upper bounds, it can have only one least 
upper bound. If s i and S 2 are both least upper bounds for a set A , then 
by property (ii) in Definition 1.3.2 we can assert s i < 52 and 52 < s i. The 
conclusion is that si = 52 and least upper bounds are unique. 
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Example 1.3.3. Let 




n G N 



The set A is bounded above and below. Successful candidates for an upper 
bound include 3, 2, and 3/2. For the least upper bound, we claim sup A = 1. 
To argue this rigorously using Definition 1.3.2, we need to verify that properties 
(i) and (ii) hold. For (i), we just observe that 1 > 1/n for all choices of n E N. 
To verify (ii), we begin by assuming we are in possession of some other upper 
bound b. Because IgA and b is an upper bound for A , we must have 1 < b. 
This is precisely what property (ii) asks us to show. 

Although we do not quite have the tools we need for a rigorous proof (see 
Theorem 1.4.2), it should be somewhat apparent that inf A = 0. 

An important lesson to take from Example 1.3.3 is that sup A and inf A may 
or may not be elements of the set A. This issue is tied to understanding the 
crucial difference between the maximum and the supremum (or the minimum 
and the infimum) of a given set. 

Definition 1.3.4. A real number a o is a maximum of the set A if ao is an 
element of A and ao > a for all a E A. Similarly, a number a\ is a minimum of 
A if ai G A and a\ < a for every a E A. 

Example 1.3.5. To belabor the point, consider the open interval 


(0,2) = {x e R : 0 < x < 2}, 


and the closed interval 


[0, 2] = {x G R : 0 < x < 2}. 

Both sets are bounded above (and below), and both have the same least upper 
bound, namely 2. It is not the case, however, that both sets have a maximum. 
A maximum is a specific type of upper bound that is required to be an element 
of the set in question, and the open interval (0, 2) does not possess such an 
element. Thus, the supremum can exist and not be a maximum, but when a 
maximum exists, then it is also the supremum. 

Let’s turn our attention back to the Axiom of Completeness. Although we 
can see now that not every nonempty bounded set contains a maximum, the 
Axiom of Completeness asserts that every such set does have a least upper 
bound. We are not going to prove this. An axiom in mathematics is an ac- 
cepted assumption, to be used without proof. Preferably, an axiom should be 
an elementary statement about the system in question that is so fundamental 
that it seems to need no justification. Perhaps the Axiom of Completeness fits 
this description, and perhaps it does not. Before deciding, let’s remind ourselves 
why it is not a valid statement about Q. 
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Example 1.3.6. Consider again the set 

S = {r G Q : r 2 < 2}, 

and pretend for the moment that our world consists only of rational numbers. 
The set S is certainly bounded above. Taking 6 = 2 works, as does b = 3/2. But 
notice what happens as we go in search of the least upper bound. (It may be 
useful here to know that the decimal expansion for \/2 begins 1.4142 . . . .) We 
might try b = 142/100, which is indeed an upper bound, but then we discover 
that b = 1415/1000 is an upper bound that is smaller still. Is there a smallest 
one? 

In the rational numbers, there is not. In the real numbers, there is. Back 
in R, the Axiom of Completeness states that we may set a = sup A and be 
confident that such a number exists. In the next section, we will prove that 
o? = 2. But according to Theorem 1.1.1, this implies a is not a rational 
number. If we are restricting our attention to only rational numbers, then a 
is not an allowable option for sup 5, and the search for a least upper bound 
goes on indefinitely. Whatever rational upper bound is discovered, it is always 
possible to find one smaller. 

The tools needed to carry out the computations described in Example 1.3.6 
depend on results about how Q and N fit inside of R. These are discussed in the 
next section. In the meantime, it is possible to prove some intuitive algebraic 
properties of least upper bounds just using the definition. 

Example 1.3.7. Let A C R be nonempty and bounded above, and let c E R. 
Define the set c + A by 


c ~j~ A — {c T d ! a G A j - . 

Then sup(c + A) = c + sup A. 

To properly verify this we focus separately on each part of Definition 1.3.2. 
Setting s = sup A, we see that a < s for all a E A, which implies c + a < c-\-s for 
all a G A. Thus, c + s is an upper bound for c + A and condition (i) is verified. 

For (ii), let b be an arbitrary upper bound for c + A; i.e., c + a < b for all 
a G A. This is equivalent to a < b — c for all a E A, from which we conclude that 
b — c is an upper bound for A. Because s is the least upper bound of A, s < b — c, 
which can be rewritten as c + s < b. This verifies part (ii) of Definition 1.3.2, 
and we conclude sup(c + A) = c + sup A. 

There is an equivalent and useful way of characterizing least upper bounds. 
As the previous example illustrates, Definition 1.3.2 of the supremum has two 
parts. Part (i) says that sup A must be an upper bound, and part (ii) states 
that it must be the smallest one. The following lemma offers an alternative way 
to restate part (ii). 

Lemma 1.3.8. Assume s E R is an upper bound for a set A C R. Then , 
s = sup A if and only if for every choice of e > 0, there exists an element a E A 
satisfying s — e < a. 
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Proof. Here is a short rephrasing of the lemma: Given that s is an upper bound, 
s is the least upper bound if and only if any number smaller than s is not an 
upper bound. Putting it this way almost qualifies as a proof, but we will expand 
on what exactly is being said in each direction. 

(=>) For the forward direction, we assume s = sup A and consider s — e, where 
e > 0 has been arbitrarily chosen. Because s — e < s, part (ii) of Definition 1.3.2 
implies that s — e is not an upper bound for A. If this is the case, then there 
must be some element a E A for which s — e < a (because otherwise s — e would 
be an upper bound). This proves the lemma in one direction. 

(<=) Conversely, assume s is an upper bound with the property that no 
matter how e > 0 is chosen, s — e is no longer an upper bound for A. Notice 
that what this implies is that if b is any number less than 8, then b is not an 
upper bound. (Just let e = s — b.) To prove that s = sup A, we must verify part 
(ii) of Definition 1.3.2. (Read it again.) Because we have just argued that any 
number smaller than s cannot be an upper bound, it follows that if b is some 
other upper bound for A, then s < b. □ 

It is certainly the case that all of our conclusions to this point about least 
upper bounds have analogous versions for greatest lower bounds. The Axiom of 
Completeness does not explicitly assert that a nonempty set bounded below has 
an infimum, but this is because we do not need to assume this fact as part of 
the axiom. Using the Axiom of Completeness, there are several ways to prove 
that greatest lower bounds exist for nonempty bounded sets. One such proof is 
explored in Exercise 1.3.3. 

Exercises 

Exercise 1.3.1. (a) Write a formal definition in the style of Definition 1.3.2 

for the infimum or greatest lower bound of a set. 

(b) Now, state and prove a version of Lemma 1.3.8 for greatest lower bounds. 

Exercise 1.3.2. Give an example of each of the following, or state that the 
request is impossible. 

(a) A set B with inf B > supU. 

(b) A finite set that contains its infimum but not its supremum. 

(c) A bounded subset of Q that contains its supremum but not its infimum. 

Exercise 1.3.3. (a) Let A be nonempty and bounded below, and define B = 

{b G R : b is a lower bound for A}. Show that sup B = inf A. 

(b) Use (a) to explain why there is no need to assert that greatest lower bounds 
exist as part of the Axiom of Completeness. 

Exercise 1.3.4. Let Ai,A. 2 ,A 3 ,... be a collection of nonempty sets, each of 
which is bounded above. 
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(a) Find a formula for sup(AiUA. 2 ). Extend this to sup (Ufe=i 7L&). 

(b) Consider sup (UfcLi ^U). Does the formula in (a) extend to the infinite 
case? 

Exercise 1.3.5. As in Example 1.3.7, let A C R be nonempty and bounded 
above, and let cgR. This time define the set cA = {ca : a E A}. 

(a) If c > 0, show that sup(cA) = csup A. 

(b) Postulate a similar type of statement for sup(cA) for the case c < 0. 

Exercise 1.3.6. Given sets A and B, define A + B = {a + b : a E A and b E B}. 
Follow these steps to prove that if A and B are nonempty and bounded above 
then sup(A + B) = sup A + sup B. 

(a) Let s = sup A and t = sup B. Show s -ft is an upper bound for A + B. 

(b) Now let u be an arbitrary upper bound for A + L>, and temporarily fix 
a E A. Show t < u — a. 

(c) Finally, show sup(A + B) = s -ft. 

(d) Construct another proof of this same fact using Lemma 1.3.8. 

Exercise 1.3.7. Prove that if a is an upper bound for A, and if a is also an 
element of A, then it must be that a = sup A. 

Exercise 1.3.8. Compute, without proofs, the suprema and infima (if they 
exist) of the following sets: 

(a) {m/n : m, n E N with m < n}. 

(b) {( — l) m /n : m, n E N}. 

(c) {n/ (3 n + 1) : n E N}. 

(d) {m/ (m + n) : m, n E N}. 

Exercise 1.3.9. (a) If sup A < sup 5, show that there exists an element 

b E B that is an upper bound for A. 

(b) Give an example to show that this is not always the case if we only assume 
sup A < sup B. 

Exercise 1.3.10 (Cut Property). The Cut Property of the real numbers is 
the following: 

If A and B are nonempty, disjoint sets with A U B = R and a < b for all 
a E A and b E 5, then there exists c E R such that x < c whenever x E A and 
x > c whenever x E B. 

(a) Use the Axiom of Completeness to prove the Cut Property. 
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(b) Show that the implication goes the other way; that is, assume R possesses 
the Cut Property and let E be a nonempty set that is bounded above. 
Prove supE exists. 

(c) The punchline of parts (a) and (b) is that the Cut Property could be used 
in place of the Axiom of Completeness as the fundamental axiom that 
distinguishes the real numbers from the rational numbers. To drive this 
point home, give a concrete example showing that the Cut Property is not 
a valid statement when R is replaced by Q. 

Exercise 1.3.11. Decide if the following statements about suprema and infima 
are true or false. Give a short proof for those that are true. For any that are 
false, supply an example where the claim in question does not appear to hold. 

(a) If A and B are nonempty, bounded, and satisfy A C B, then sup A < 
sup B. 

(b) If sup A < inf B for sets A and B, then there exists a c E R satisfying 
a < c < b for all a E A and b E B. 

(c) If there exists a c E R satisfying a < c < b for all a E A and b E B, then 
sup A < inf B. 

1.4 Consequences of Completeness 

The first application of the Axiom of Completeness is a result that may look 
like a more natural way to mathematically express the sentiment that the real 
line contains no gaps. 

Theorem 1.4.1 (Nested Interval Property). For each n E N, assume we 
are given a closed interval I n = [a n ,b n \ = {x E R : a n < x < b n }. Assume 
also that each I n contains / n +i. Then, the resulting nested sequence of closed 
intervals 

h D I 2 A h D h D • • • 

has a nonempty intersection ; that is, fT=i«0. 

Proof. In order to show that fXi=i In is not empty, we are going to use the 
Axiom of Completeness (AoC) to produce a single real number x satisfying 
x G I n for every n E N. Now, AoC is a statement about bounded sets, and the 
one we want to consider is the set 

A = {a n : n £ N} 

of left-hand endpoints of the intervals. 


A—{a n \ nCN} 



a i <22 <23 • • • a n • • • • • • b n • • • 63 62 bi 
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Because the intervals are nested, we see that every b n serves as an upper bound 
for A. Thus, we are justified in setting 


x 


sup A. 


Now, consider a particular I n = 


&n 5 b 


n 


Because x is an upper bound for A , 


we have a n < x. The fact that each b n is an upper bound for A and that x is 
the least upper bound implies x < b n . 

Altogether then, we have a n < x < 6 n , which means x E I n for every choice 
of n E N. Hence, x E Pl^Li and the intersection is not empty. □ 


The Density of Q in R 

The set Q is an extension of N, and R in turn is an extension of Q. The next 
few results indicate how N and Q sit inside of R. 

Theorem 1.4.2 (Archimedean Property), (i) Given any number x E R, 
there exists an n N satisfying n > x. 

(ii) Given any real number y > 0, there exists an n N satisfying 1 jn < y. 

Proof. Part (i) of the proposition states that N is not bounded above. There 
has never been any doubt about the truth of this, and it could be reasonably 
argued that we should not have to prove it at all, especially in light of the fact 
that we have decided to take other familiar properties of N, Z, and Q as given. 

The counterargument is that there is still a great deal of mystery about 
what the real numbers actually are. What we have said so far is that R is an 
extension of Q that maintains the algebraic and order properties of the rationals 
but also possesses the least upper bound property articulated in the Axiom of 
Completeness. In the absence of any other information about R, we have to 
consider the possibility that in extending Q we unwittingly acquired some new 
numbers that are upper bounds for N. In fact, as disorienting as it may sound, 
there are ordered field extensions of Q that include “numbers” bigger than every 
natural number. Theorem 1.4.2 asserts that the real numbers do not contain 
such exotic creatures. The Axiom of Completeness, which we adopted to patch 
up the holes in Q, carries with it the implication that N is an unbounded subset 
of R. 

And so to the proof. Assume, for contradiction, that N is bounded above. 
By the Axiom of Completeness (AoC), N should then have a least upper bound, 
and we can set a = supN. If we consider a — 1, then we no longer have an 
upper bound (see Lemma 1.3.8), and therefore there exists an n E N satisfying 
a — 1 < n. But this is equivalent to a < n + 1. Because n + 1 E N, we have 
a contradiction to the fact that a is supposed to be an upper bound for N. 
(Notice that the contradiction here depends only on AoC and the fact that N 
is closed under addition.) 

Part (ii) follows from (i) by letting x = l/y. □ 

This familiar property of N is the key to an extremely important fact about 
how Q fits inside of R. 
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Theorem 1.4.3 (Density of Q in R). For every two real numbers a and b 
with a < b, there exists a rational number r satisfying a < r < b. 

Proof. A rational number is a quotient of integers, so we must produce m £ Z 
and n £ N so that 

(1) a < — < b. 

n 

The first step is to choose the denominator n large enough so that consecutive 
increments of size 1/n are too close together to “step over” the interval (a, b). 

1_ 2_ 3 ^ _ rn — 1 m 

n n n n n 

- 1 1 1 1 1 1 1 y 1 «4 1 1 

0 a b 

Using the Archimedean Property (Theorem 1.4.2), we may pick n £ N large 
enough so that 

/ x 1 

(2) — < b — a. 

n 

Inequality (1) (which we are trying to prove) is equivalent to na < m < nb. 
With n already chosen, the idea now is to choose m to be the smallest integer 
greater than na. In other words, pick m £ Z so that 

( 3 ) ( 4 ) 

m — 1 < na < m. 

Now, inequality (4) immediately yields a < m/n , which is half of the battle. 
Keeping in mind that inequality (2) is equivalent to a < b — 1/n, we can use (3) 
to write 


m < 
< 


Because m < nb implies m/n < 6, we have a < m/n < 6, as desired. □ 

Theorem 1.4.3 is paraphrased by saying that Q is dense in R. Without 
working too hard, we can use this result to show that the irrational numbers 
are dense in R as well. 

Corollary 1.4.4. Given any two real numbers a <b, there exists an irrational 
number t satisfying a < t < b. 


na + 1 


1 


nib )+l 

\ n 

nb. 


Proof. Exercise 1.4.5. 


□ 
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The Existence of Square Roots 


It is time to tend to some unfinished business left over from Example 1.3.6 and 
this chapter’s opening discussion. 

Theorem 1.4.5. There exists a real number a e R satisfying a 2 = 2. 


Proof. After reviewing Example 1.3.6, consider the set 

T = {t e R : t 2 < 2} 

and set a = sup T. We are going to prove a 2 = 2 by ruling out the possibilities 
a 2 < 2 and a 2 > 2. Keep in mind that there are two parts to the definition of 
sup T, and they will both be important. (This always happens when a supremum 
is used in an argument.) The strategy is to demonstrate that a 2 < 2 violates 
the fact that a is an upper bound for T, and a 2 > 2 violates the fact that it is 
the least upper bound. 

Let’s first see what happens if we assume a 2 < 2. In search of an element of 
T that is larger than a, write 

o 2a 1 

a H 1 — — 

n n z 

9 2a 1 

a H 1 — 

n n 

o 2a ~\~ 1 

a + . 

n 

But now assuming a 2 < 2 gives us a little space in which to fit the {2a + l)/n 
term and keep the total less than 2. Specifically, choose no G N large enough 
so that 

1 2 - a 2 

— < . 

no 2a ~\~ 1 

This implies {2a + l)/no < 2 — a 2 , and consequently that 

1 \ 2 

a H < a 2 + (2 - a 2 ) = 2. 

no) 

Thus, a + 1/no £ T, contradicting the fact that a is an upper bound for T. We 
conclude that a 2 <2 cannot happen. 

Now, what about the case a 2 > 2? This time, write 

( 1\ 2 2 2a 1 

[a — — ) — a — 1 — 7T 

\ n J n n z 

o 2a 

> a z . 

n 

The remainder of the argument is requested in Exercise 1.4.7. □ 



A small modification of this proof can be made to show that yT exists for 
any x > 0. A formula for expanding {a + l/n) m called the binomial formula 
can be used to show that yfx exists for arbitrary values of m E N. 
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Exercises 


Exercise 1.4.1. Recall that I stands for the set of irrational numbers. 

(a) Show that if a, b £ Q, then ab and a + b are elements of Q as well. 

(b) Show that if a £ Q and t £ I, then a -f t £ I and at £ I as long as a ^ 0 . 

(c) Part (a) can be summarized by saying that Q is closed under addition and 
multiplication. Is I closed under addition and multiplication? Given two 
irrational numbers s and £, what can we say about s + 1 and stl 

Exercise 1.4.2. Let A C R be nonempty and bounded above, and let s £ R 
have the property that for all n £ N, s + ^ is an upper bound for A and s — I 

is not an upper bound for A. Show s = sup A. 

Exercise 1.4.3. Prove that rr=i(o,i/ n) = 0. Notice that this demonstrates 
that the intervals in the Nested Interval Property must be closed for the con- 
clusion of the theorem to hold. 


Exercise 1.4.4. Let a < b be real numbers and consider the set T = Q D [a, b] 
Show supT = b. 


Exercise 1.4.5. Using Exercise 1.4.1, supply a proof for Corollary 1.4.4 by 
considering the real numbers a — y/2 and b — y/2. 

Exercise 1.4.6. Recall that a set B is dense in R if an element of B can be 
found between any two real numbers a < b. Which of the following sets are 
dense in R? Take p £ Z and q £ N in every case. 

(a) The set of all rational numbers p/q with q < 10. 


(b) The set of all rational numbers p/q with q a power of 2. 


(c) The set of all rational numbers p/q with 10|p| > q. 


Exercise 1.4.7. Finish the proof of Theorem 1.4.5 by showing that the 
assumption a 2 > 2 leads to a contradiction of the fact that a = sup T. 


Exercise 1.4.8. Give an example of each or state that the request is impossible. 
When a request is impossible, provide a compelling argument for why this is 
the case. 


(a) Two sets A and B with A D B = 0, sup A = sup 5, sup A ^ A and 
sup B £ B. 

(b) A sequence of nested open intervals J\ j 2 D j 3 D • • • with rr= : i Jn 
nonempty but containing only a finite number of elements. 



A sequence of nested unbounded closed intervals L\ L 2 L> L 3 N • • • 
with f|~i T n = 0. (An unbounded closed interval has the form [a, 00 ) = 
{x £ R : x > a}.) 


(d) A sequence of closed (not necessarily nested) intervals /]_, I 2 , -^ 3 , • • • with 
the property that H^=i 4^0 f° r a U N £ N, but In = 0- 
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1.5 Cardinality 

The applications of the Axiom of Completeness to this point have basically 
served to restore our confidence in properties we already felt we knew about the 
real number system. One final consequence of completeness that we are about 
to explore is of a very different nature and, on its own, represents an astounding 
intellectual discovery. The traditional way that mathematics gets done is by 
one mathematician modifying and expanding on the work of those who came 
before. This model does not seem to apply to Georg Cantor (1845-1918), at 
least with regard to his work on the theory of infinite sets. 

At the moment, we have an image of R as consisting of rational and irrational 
numbers, continuously packed together along the real line. We have seen that 
both Q and I (the set of irrationals) are dense in R, meaning that in every 
interval (a, b ) there exist rational and irrational numbers alike. Mentally, there 
is a temptation to think of Q and I as being intricately mixed together in equal 
proportions, but this turns out not to be the case. In a way that Cantor made 
precise, the irrational numbers far outnumber the rational numbers in making 
up the real line. 

1—1 Correspondence 

The term cardinality is used in mathematics to refer to the size of a set. The 
cardinalities of finite sets can be compared simply by attaching a natural number 
to each set. The set of Snow White’s dwarfs is smaller than the set of United 
States Supreme Court Justices because 7 is less than 9. But how might we 
draw this same conclusion without referring to any numbers? Cantor’s idea was 
to attempt to put the sets into a 1-1 correspondence with each other. There 
are fewer dwarfs than Justices because, if the dwarfs were all simultaneously 
appointed to the bench, there would still be two empty chairs to fill. On the 
other hand, the cardinality of the Supreme Court is the same as the cardinality 
of the set of fielders on a baseball team. This is because, when the judges take 
the field, it is possible to arrange them so that there is exactly one judge at 
every position. 

The advantage of this method of comparing the sizes of sets is that it works 
equally well on sets that are infinite. 

Definition 1.5.1. A function / : A — > B is one-to-one (1-1) if a\ ^ in A 
implies that f(ai) ^ f(a>2) in B. The function / is onto if, given any b E R, it 
is possible to find an element a E A for which /(a) = b. 

A function / : A — > B that is both 1-1 and onto provides us with exactly 
what we mean by a 1-1 correspondence between two sets. The property of 
being 1-1 means that no two elements of A correspond to the same element of 
B (no two judges are playing the same position), and the property of being onto 
ensures that every element of B corresponds to something in A (there is a judge 
at every position). 
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Definition 1.5.2. The set A has the same cardinality as B if there exists 
/ : A B that is 1-1 and onto. In this case, we write A ~ B. 

Example 1.5.3. (i) If we let E = {2,4,6,...} be the set of even natural 

numbers, then we can show N ~ E. To see why, let / : N E be given 
by f{n) = 2 n. 

N: 1 2 3 4 ••• n 

X X x ■■■ 

E : 2 4 6 8 • • • 2 n 


It is certainly true that E is a proper subset of N, and for this reason it 
may seem logical to say that E is a “smaller” set than N. This is one 
way to look at it, but it represents a point of view that is heavily biased 
from an overexposure to finite sets. The definition of cardinality is quite 
specific, and from this point of view E and N are equivalent. 

(ii) To make this point again, note that although N is contained in Z as a 
proper subset, we can show N ~ Z. This time let 

f(ri] = I ( n ~ l)/2 if n is odd 
J ' ' { — n/2 if n is even. 

The important details to verify are that / does not map any two natural 
numbers to the same element of Z (/is 1-1) and that every element of Z 
gets “hit” by something in N (/ is onto). 


N : 1 


6 7 


/K >TS. /K /K /K >TS. 

•X' X" 

Z : 0 -1 1 -2 2 -3 3 


Example 1.5.4. A little calculus (which we will not supply) shows that the 
function f(pc) = x/(x 2 — 1) takes the interval (—1,1) onto R in a 1-1 fashion 
(Fig. 1.4). Thus (—1, 1) ~ R. In fact, (a, b) ~ R for any interval (a, b). 


Countable Sets 

Definition 1.5.5. A set A is countable if N ~ A. An infinite set that is not 
countable is called an uncountable set. 
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Figure 1.4: (—1, 1) ~ R USING f(x) = x/(x 2 — 1). 


From Example 1.5.3, we see that both E and Z are countable sets. Putting 
a set into a 1-1 correspondence with N, in effect, means putting all of the 
elements into an infinitely long list or sequence. Looking at Example 1.5.3, we 
can see that this was quite easy to do for E and required only a modest bit 
of shuffling for the set Z. A natural question arises as to whether all infinite 
sets are countable. Given some infinite set such as Q or R, it might seem as 
though, with enough cleverness, we should be able to fit all the elements of our 
set into a single list (i.e., into a correspondence with N). After all, this list is 
infinitely long so there should be plenty of room. But alas, as Hardy remarks, 
“[The mathematician’s] subject is the most curious of all — there is none in which 
truth plays such odd pranks.” 

Theorem 1.5.6. (i) The set Q is countable, (ii) The set R is uncountable. 
Proof, (i) Set A\ = {0} and for each n > 2 , let A n be the set given by 



P 

±- : where p, q E N are in lowest terms with p + q = n 
Q 


The first few of these sets look like 


— { 0 } , A 2 




1 - 12 - 2 ! 

2’ ~P T’ T~ j ’ 



1 -1 3 -3! 

3 ’~jpT’T" J ’ 


and A 5 


1-12-23-34 -41 
4’ X’ 3’ "IP 2’ I”’ 1 ’ ~Y) ' 


The crucial observation is that each A n is finite and every rational number 
appears in exactly one of these sets. Our 1-1 correspondence with N is then 
achieved by consecutively listing the elements in each A n . 
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Admittedly, writing an explicit formula for this correspondence would be an 
awkward task, and attempting to do so is not the best use of time. What 
matters is that we see why every rational number appears in the correspondence 
exactly once. Given, say, 22/7, we have that 22/7 E A 2 9 . Because the set of 
elements in A 1 , . . . , A 2 § is finite, we can be confident that 22/7 eventually gets 
included in the sequence. The fact that this line of reasoning applies to any 
rational number p/q is our proof that the correspondence is onto. To verify 
that it is 1 - 1 , we observe that the sets A n were constructed to be disjoint so 
that no rational number appears twice. This completes the proof of (i). 

(ii) The second statement of Theorem 1.5.6 is the truly unexpected part, 
and its proof is done by contradiction. Assume that there does exist a 1 - 1 , 
onto function / : N -A R. Again, what this suggests is that it is possible to 
enumerate the elements of R. If we let x\ = /( 1), x 2 = /( 2), and so on, then 
our assumption that / is onto means that we can write 

(1) R = {xi,X 2 ,X3,X A , . . .} 

and be confident that every real number appears somewhere on the list. We 
will now use the Nested Interval Property (Theorem 1.4.1) to produce a real 
number that is not there. 

Let Ii be a closed interval that does not contain x\. Next, let I 2 be a closed 
interval, contained in A, which does not contain x 2 . The existence of such an 
I 2 is easy to verify. Certainly I\ contains two smaller disjoint closed intervals, 
and x 2 can only be in one of these. In general, given an interval 7 n , construct 
7 n+ i to satisfy 

(i) I n + 1 4 In and 

(ii) X n +i ^ In+ 1 - 








We now consider the intersection fj^Li If %n 0 is some real number from the 
list in (1), then we have x no </ 7 no , and it follows that 


x 


n 0 


i n 

n— 1 
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Now, we are assuming that the list in (1) contains every real number, and this 
leads to the conclusion that 

oo 

n v = 0. 

n — 1 

However, the Nested Interval Property (NIP) asserts that fT=i In ^ 0- By 
NIP, there is at least one x E fl^Li that, consequently, cannot be on the list 
in (1). This contradiction means that such an enumeration of R is impossible, 
and we conclude that R is an uncountable set. □ 

What exactly should we make of this discovery? It is an important exercise 
to show that any subset of a countable set must be either countable or finite. 
This should not be too surprising. If a set can be arranged into a single list, then 
deleting some elements from this list results in another (shorter, and potentially 
terminating) list. This means that countable sets are the smallest type of infinite 
set. Anything smaller is either still countable or finite. 

The force of Theorem 1.5.6 is that the cardinality of R is, informally speak- 
ing, a larger type of infinity. The real numbers so outnumber the natural num- 
bers that there is no way to map N onto R. No matter how we attempt this, 
there are always real numbers to spare. The set Q, on the other hand, is count- 
able. As far as infinite sets are concerned, this is as small as it gets. What does 
this imply about the set I of irrational numbers? By imitating the demonstra- 
tion that N ~ Z, we can prove that the union of two countable sets must be 
countable. Because R = Q U I, it follows that I cannot be countable because 
otherwise R would be. The inescapable conclusion is that, despite the fact that 
we have encountered so few of them, the irrational numbers form a far greater 
subset of R than Q. 

The properties of countable sets described in this discussion are useful for a 
few exercises in upcoming chapters. For easier reference, we state them as some 
final propositions and outline their proofs in the exercises that follow. 

Theorem 1.5.7. If A C B and B is countable, then A is either countable or 
finite. 

Theorem 1.5.8. (i) If Ai, A 2 , . . . A m are each countable sets, then the union 

Ai U A 2 U • • • U A m is countable. 

(ii) If A n is a countable set for each n E N, then U“=l An is countable. 

Exercises 

Exercise 1.5.1. Finish the following proof for Theorem 1.5.7. 

Assume B is a countable set. Thus, there exists / : N B, which is 1-1 
and onto. Let A C B be an infinite subset of B. We must show that A is 
countable. 

Let n 1 = min{n E N : f(n) E A}. As a start to a definition of g : N A, 
set g( 1) = f{ni). Show how to inductively continue this process to produce a 
1-1 function g from N onto A. 
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Exercise 1.5.2. Review the proof of Theorem 1.5.6, part (ii) showing that R 
is uncountable, and then find the flaw in the following erroneous proof that Q 
is uncountable: 

Assume, for contradiction, that Q is countable. Thus we can write Q = 
{ri, 7*2, 7*3, . . .} and, as before, construct a nested sequence of closed intervals 
with r n ^ I n . Our construction implies H^Li Ai = 0 while NIP implies fX?=i In 7^ 
0. This contradiction implies Q must therefore be uncountable. 

Exercise 1.5.3. Use the following outline to supply proofs for the statements 
in Theorem 1.5.8. 


(a) First, prove statement (i) for two countable sets, A\ and A 2 . Exam- 
ple 1.5.3 (ii) may be a useful reference. Some technicalities can be avoided 
by first replacing A 2 with the set B 2 = A 2 VA 1 = {x E A 2 : x £ Ai}. The 
point of this is that the union A\ U B 2 is equal to A\ U A 2 and the sets 
Ai and B 2 are disjoint. (What happens if B 2 is finite?) 

Now, explain how the more general statement in (i) follows. 

(b) Explain why induction cannot be used to prove part (ii) of Theorem E5.8 
from part (i). 

(c) Show how arranging N into the two-dimensional array 


13 6 10 15 
2 5 9 14 • • • 
4 8 13 • • • 

7 12 

11 ••• 


leads to a proof of Theorem E5.8 (ii). 

Exercise 1.5.4. (a) Show (a, b) ~ R for any interval (a, b). 

(b) Show that an unbounded interval like (a, 00) = {x : x > a} has the same 
cardinality as R as well. 

(c) Using open intervals makes it more convenient to produce the required 

1-1, onto functions, but it is not really necessary. Show that [0, 1) (0,1) 

by exhibiting a 1-1 onto function between the two sets. 

Exercise 1.5.5. (a) Why is A ~ A for every set A? 

(b) Given sets A and B, explain why A ~ B is equivalent to asserting B ~ A. 

(c) For three sets A, B , and C, show that A ~ B and B ~ C implies A ~ C. 
These three properties are what is meant by saying that ^ is an equivalence 
relation. 
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Exercise 1.5.6. (a) Give an example of a countable collection of disjoint 

open intervals. 

(b) Give an example of an uncountable collection of disjoint open intervals, 
or argue that no such collection exists. 

Exercise 1.5.7. Consider the open interval (0,1), and let S be the set of points 
in the open unit square; that is, S = {(x,y) : 0 < x,y < 1}. 

(a) Find a 1-1 function that maps (0, 1) into, but not necessarily onto, S. 
(This is easy.) 

(b) Use the fact that every real number has a decimal expansion to produce 
a 1-1 function that maps S into (0,1). Discuss whether the formulated 
function is onto. (Keep in mind that any terminating decimal expansion 
such as .235 represents the same real number as .234999 . . . .) 

The Schroder-Bernstein Theorem discussed in Exercise 1.5.11 can now be 
applied to conclude that (0, 1) ~ S. 

Exercise 1.5.8. Let B be a set of positive real numbers with the property that 
adding together any finite subset of elements from B always gives a sum of 2 or 
less. Show B must be finite or countable. 

Exercise 1.5.9. A real number x G R is called algebraic if there exist integers 
ao, ai, &2, • • • , £ Z, not all zero, such that 

a n x n a n -ix n H- • • • T a\x no = 0. 


Said another way, a real number is algebraic if it is the root of a polynomial with 
integer coefficients. Real numbers that are not algebraic are called transcenden- 
tal numbers. Reread the last paragraph of Section 1.1. The final question posed 
here is closely related to the question of whether or not transcendental numbers 
exist. 

(a) Show that \/2, v^2, and y/3 + y/2 are algebraic. 

(b) Fix n G N, and let A n be the algebraic numbers obtained as roots of poly- 
nomials with integer coefficients that have degree n. Using the fact that 
every polynomial has a finite number of roots, show that A n is countable. 

(c) Now, argue that the set of all algebraic numbers is countable. What may 
we conclude about the set of transcendental numbers? 


Exercise 1.5.10. (a) Let C C [0, 1] be uncountable. Show that there exists 

a G (0, 1) such that C D [a, 1] is uncountable. 

(b) Now let A be the set of all a G (0, 1) such that C D [a, 1] is uncountable, 
and set a = sup A. Is C D [a, 1] an uncountable set? 


(c) Does the statement in (a) remain true if “uncountable” is replaced by 
“infinite” ? 
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Exercise 1.5.11 (Schroder— Bernstein Theorem). Assume there exists a 
1-1 function / : X Y and another 1-1 function g : Y X. Follow the steps 
to show that there exists a 1-1, onto function h : X Y and hence X ~ Y. 
The strategy is to partition X and Y into components 

X = AU A' and Y = B U B' 

with An A' = 0 and B D B' = 0, in such a way that / maps A onto L>, and g 
maps B' onto A! . 

(a) Explain how achieving this would lead to a proof that X ~ Y. 

(b) Set A 1 = X\g(Y ) = {x G X : x ^ g(Y)} (what happens if A\ = 0?) and 
inductively define a sequence of sets by letting A n+ i = g(f(A n )). Show 
that { A n : n G N} is a pairwise disjoint collection of subsets of X, while 
{ f(A n ) : n G N} is a similar collection in Y. 

(c) Let A = U^i and B = U^Li f(A n )- Show that / maps A onto B. 

(d) Let A! = X\A and B' = Y\B. Show g maps B' onto A' . 

1.6 Cantor’s Theorem 

Cantor’s work into the theory of infinite sets extends far beyond the conclusions 
of Theorem 1.5.6. Although initially resisted, his creative and relentless assault 
in this area eventually produced a revolution in set theory and a paradigm shift 
in the way mathematicians came to understand the infinite. 

Cantor’s Diagonalization Method 

Cantor published his discovery that R is uncountable in 1874. Although it 
has some modern polish on it, the argument presented in Theorem 1.5.6 (ii) 
is actually quite similar to the one Cantor originally found. In 1891, Cantor 
offered another proof of this same fact that is startling in its simplicity. It 
relies on decimal representations for real numbers, which we will accept and use 
without any formal definitions. 

Theorem 1.6.1. The open interval (0, 1) = {x G R : 0 < x < 1} is 
uncountable. 

Exercise 1.6.1. Show that (0, 1) is uncountable if and only if R is uncountable. 
This shows that Theorem 1.6.1 is equivalent to Theorem 1.5.6. 

Proof. As with Theorem 1.5.6, we proceed by contradiction and assume that 
there does exist a function / : N (0, 1) that is 1-1 and onto. For each m G N, 
f{m) is a real number between 0 and 1, and we represent it using the decimal 
notation 

f(m) = • a rn 1 a r n 2 a rn 3 a r n 4 a r n 5 • • • • 


1.6. Cantor’s Theorem 


33 


What is meant here is that for each m,n G N, a mn is the digit from the set 
{0, 1, 2, . . . , 9} that represents the nth digit in the decimal expansion of f(m). 
The 1-1 correspondence between N and (0, 1) can be summarized in the doubly 
indexed array 


N (0, 1) 


1 

v- 

-7 

/( 1 ) 

= .an 

&12 

&13 

a 14 

&15 

ai6 

2 

v- 

-7 

/( 2) 

= -&21 

022 

&23 

&24 

&25 

&26 

3 

v- 

-7 

/( 3) 

= -C.31 

&32 

«33 

&34 

&35 

&36 

4 

v- 

-7 

m 

= .U41 

&42 

&43 

a 44 

&45 

&46 

5 

v- 

-7 

/( 5) 

rH 

iO 

e 

&52 

&53 

<254 

a 55 

&56 

6 

v- 

-7 

/( 6) 

— -&61 

&62 

&63 

&64 

&65 

a 66 


The key assumption about this correspondence is that every real number in 
( 0 , 1 ) is assumed to appear somewhere on the list. 

Now for the pearl of the argument. Define a real number x G (0,1) with the 
decimal expansion x = .b^b^b^ . . . using the rule 

_ f 2 if a nn ^ 2 
n “ 1 3 if a nn = 2. 

Let’s be clear about this. To compute the digit &i, we look at the digit an in 
the upper left-hand corner of the array. If an = 2, then we choose 6 i =3; 
otherwise, we set b\ = 2 . 

Exercise 1.6.2. (a) Explain why the real number x = . 6162 ^ 3^4 •• • cannot 

be /( 1 ). 

(b) Now, explain why x 7 ^ /( 2), and in general why x 7 ^ f(n) for any n G N. 

(c) Point out the contradiction that arises from these observations and con- 
clude that ( 0 , 1 ) is uncountable. □ 

Exercise 1.6.3. Supply rebuttals to the following complaints about the proof 
of Theorem 1.6.1. 

(a) Every rational number has a decimal expansion, so we could apply this 
same argument to show that the set of rational numbers between 0 and 1 
is uncountable. However, because we know that any subset of Q must be 
countable, the proof of Theorem E6.1 must be flawed. 

(b) Some numbers have two different decimal representations. Specifically, 
any decimal expansion that terminates can also be written with repeating 
9’s. For instance, 1/2 can be written as .5 or as .4999.... Doesn’t this 
cause some problems? 
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Exercise 1.6.4. Let S be the set consisting of all sequences of 0 ’s and l’s. 
Observe that S is not a particular sequence, but rather a large set whose ele- 
ments are sequences; namely, 

S = {(ai, <22, <23, . . .) : a n = 0 or 1}. 

As an example, the sequence (1, 0, 1, 0, 1, 0, 1, 0, . . .) is an element of 5 , as is the 
sequence (1, 1, 1, 1, 1 , 1 ,.. .). 

Give a rigorous argument showing that S is uncountable. 

Having distinguished between the countable infinity of N and the uncount- 
able infinity of R, a new question that occupied Cantor was whether or not there 
existed an infinity “above” that of R. This is logically treacherous territory. 
The same care we gave to defining the relationship “has the same cardinality 
as” needs to be given to defining relationships such as “has cardinality greater 
than” or “has cardinality less than or equal to.” Nevertheless, without getting 
too weighed down with formal definitions, one gets a very clear sense from the 
next result that there is a hierarchy of infinite sets that continues well beyond 
the continuum of R. 

Power Sets and Cantor’s Theorem 

Given a set A , the power set P(A) refers to the collection of all subsets of A. It 
is important to understand that P(A) is itself considered a set whose elements 
are the different possible subsets of A. 

Exercise 1.6.5. (a) Let A = {a,b, c}. List the eight elements of P(A). (Do 

not forget that 0 is considered to be a subset of every set.) 

(b) If A is finite with n elements, show that P(A) has 2 n elements. 

Exercise 1.6.6. (a) Using the particular set A = {a, 6, c}, exhibit two differ- 

ent 1-1 mappings from A into P(A). 

(b) Letting C = { 1 , 2, 3 , 4 }, produce an example of a 1-1 map g : C P(C). 

(c) Explain why, in parts (a) and (b), it is impossible to construct mappings 
that are onto. 

Cantor’s Theorem states that the phenomenon in Exercise 1 . 6.6 holds for in- 
finite sets as well as finite sets. Whereas mapping A into P(A) is quite effortless, 
finding an onto map is impossible. 

Theorem 1.6.2 (Cantor’s Theorem). Given any set A, there does not exist 
a function f : A —> P(A) that is onto. 

Proof. This proof, like the others of its kind, is indirect. Thus, assume, for 
contradiction, that / : A P(A) is onto. Unlike the usual situation in which 
we have sets of numbers for the domain and range, / is a correspondence between 
a set and its power set. For each element a E A, f(a) is a particular subset of A. 
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The assumption that / is onto means that every subset of A appears as /(a) 
for some a E A. To arrive at a contradiction, we will produce a subset B C A 
that is not equal to /(a) for any a E A. 

Construct B using the following rule. For each element aGi, consider the 
subset /(a). This subset of A may contain the element a or it may not. This 
depends on the function /. If /(a) does not contain a, then we include a in our 
set B. More precisely, let 


B = {a E A : a ^ /(a)}. 

Exercise 1.6.7. Return to the particular functions constructed in Exercise 1.6.6 
and construct the subset B that results using the preceding rule. In each case, 
note that B is not in the range of the function used. 

We now focus on the general argument. Because we have assumed that our 
function f : A P(A) is onto, it must be that B = f{a ! ) for some a' E A. The 
contradiction arises when we consider whether or not a' is an element of B. 

Exercise 1.6.8. (a) First, show that the case a' E B leads to a contradiction. 

(b) Now, finish the argument by showing that the case a' ^ B is equally 
unacceptable. rn 


To get an initial sense of its broad significance, let’s apply this result to 
the set of natural numbers. Cantor’s Theorem states that there is no onto 
function from N to P(N); in other words, the power set of the natural numbers 
is uncountable. How does the cardinality of this newly discovered uncountable 
set compare to the uncountable set of real numbers? 

Exercise 1.6.9. Using the various tools and techniques developed in the last 
two sections (including the exercises from Section 1.5), give a compelling argu- 
ment showing that P( N) rsj R. 

Exercise 1.6.10. As a final exercise, answer each of the following by establish- 
ing a 1-1 correspondence with a set of known cardinality. 

(a) Is the set of all functions from {0, 1} to N countable or uncountable? 

(b) Is the set of all functions from N to {0, 1} countable or uncountable? 

(c) Given a set P, a subset A of P(P) is called an antichain if no element of A 
is a subset of any other element of A. Does P( N) contain an uncountable 
antichain? 
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1.7 Epilogue 

The relationship of having the same cardinality is an equivalence relation (see 
Exercise 1.5.5), meaning, roughly, that all of the sets in the mathematical uni- 
verse can be organized into disjoint groups according to their size. Two sets 
appear in the same group, or equivalence class , if and only if they have the same 
cardinality. Thus, N, Z, and Q are grouped together in one class with all of the 
other countable sets, whereas R is in another class that includes the intervals 
(a, b) as well as P(N). One implication of Cantor’s Theorem is that P(R) — the 
set of all subsets of R — is in a different class from R, and there is no reason 
to stop here. The set of subsets of P(R) — namely P(P(R)) — is in yet another 
class, and this process continues indefinitely. 

Having divided the universe of sets into disjoint groups, it would be con- 
venient to attach a “number” to each collection which could be used the way 
natural numbers are used to refer to the sizes of finite sets. Given a set X, 
there exists something called the cardinal number of X, denoted cardX, which 
behaves very much in this fashion. For instance, two sets X and Y satisfy 
cardX = cardT if and only if X ~ Y. (Rigorously defining cardX requires 
some significant set theory. One way this is done is to define cardX to be a 
very particular set that can always be uniquely found in the same equivalence 
class as X.) 

Looking back at Cantor’s Theorem, we get the strong sense that there is an 
order on the sizes of infinite sets that should be reflected in our new cardinal 
number system. Specifically, if it is possible to map a set X into Y in a 1-1 
fashion, then we want cardX < cardT. Writing the strict inequality cardX < 
card Y should indicate that it is possible to map X into Y but that it is not the 
case that X ~ Y. Restated in this notation, Cantor’s Theorem states that for 
every set A, cardX < cardP(H). 

There are some significant details to work out. A kind of metaphysical prob- 
lem arises when we realize that an implication of Cantor’s Theorem is that there 
can be no “largest” set. A declaration such as, “Let U be the set of all possible 
things,” is paradoxical because we immediately get that card/7 < cardP(P) 
and thus the set U does not contain everything it was advertised to hold. Is- 
sues such as this one are ultimately resolved by imposing some restrictions on 
what can qualify as a set. As set theory was formalized, the axioms had to 
be crafted so that objects such as U are simply not allowed. A more down- 
to-earth problem in need of attention is demonstrating that our definition of 
“<” between cardinal numbers really is an ordering. This involves showing that 
cardinal numbers possess a property analogous to real numbers, which states 
that if cardX < cardT and cardT < cardX, then cardX = cardT. In the 
end, this boils down to proving that if there exists / : X Y that is 1-1, 
and if there exists g : Y X that is 1-1, then it is possible to find a function 
h : X Y that is both 1-1 and onto. A proof of this fact eluded Cantor 
but was eventually supplied independently by Ernst Schroder (in 1896) and Fe- 
lix Bernstein (in 1898). An argument for the Schroder-Bernstein Theorem is 
outlined in Exercise 1.5.11. 
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There was another deep problem stemming from the budding theory of car- 
dinal numbers that occupied Cantor and which was not resolved during his 
lifetime. Because of the importance of countable sets, the symbol Hq (“aleph 
naught”) is frequently used for cardN. The subscript “0” is appropriate when 
we remember that countable sets are the smallest type of infinite set. In terms 
of cardinal numbers, if cardX < Ho, then X is finite. Thus, Hq is the small- 
est infinite cardinal number. The cardinality of R is also significant enough to 
deserve the special designation c = cardR = card(0, 1). The content of The- 
orems 1.5.6 and 1.6.1 is that Hq < c. The question that plagued Cantor was 
whether there were any cardinal numbers strictly in between these two. Put 
another way, does there exist a set A C R with cardN < card A < cardR? 
Cantor was of the opinion that no such set existed. In the ordering of cardinal 
numbers, he conjectured, c was the immediate successor of H 0 . 

Cantor’s “continuum hypothesis,” as it came to be called, was one of the 
most famous mathematical challenges of the past century. Its unexpected res- 
olution came in two parts. In 1940, the German logician and mathematician 
Kurt Godel demonstrated that, using only the agreed-upon set of axioms of set 
theory, there was no way to disprove the continuum hypothesis. In 1963, Paul 
Cohen successfully showed that, under the same rules, it was also impossible to 
prove this conjecture. Taken together, what these two discoveries imply is that 
the continuum hypothesis is undecidable. It can be accepted or rejected as a 
statement about the nature of infinite sets, and in neither case will any logical 
contradictions arise. 

The mention of Kurt Godel brings to mind a final comment about the sig- 
nificance of Cantor’s work. Godel is best known for his “Incompleteness The- 
orems,” which pertain to the strength of axiomatic systems in general. What 
Godel showed was that any consistent axiomatic system created to study arith- 
metic was necessarily destined to be “incomplete” in the sense that there would 
always be true statements that the system of axioms would be too weak to 
prove. At the heart of Godel’s very complicated proof is a type of manipulation 
closely related to what is happening in the proofs of Theorems 1.6.1 and 1.6.2. 
Variations of Cantor’s proof methods can also be found in the limitative re- 
sults of computer science. The “halting problem” asks, loosely, whether some 
general algorithm exists that can look at every program and decide if that pro- 
gram eventually terminates. The proof that no such algorithm exists uses a 
diagonalization-type construction at the core of the argument. The main point 
to make is that not only are the implications of Cantor’s theorems profound 
but the argumentative techniques are as well. As a more immediate example of 
this phenomenon, the diagonalization method is used again in Chapter 6 — in a 
constructive way — as a crucial step in the proof of the Arzela-Ascoli Theorem. 


Chapter 2 

Sequences and Series 


2.1 Discussion: Rearrangements of Infinite 
Series 

Consider the infinite series 

(-l) n+1 1111111 

1 — L = l 1 1 1 1 . 

n 2345678 

If we naively begin adding from the left-hand side, we get a sequence of what 
are called partial sums. In other words, let s n equal the sum of the first n terms 
of the series, so that s i = 1, S 2 = 1 / 2 , S 3 = 5/6, S 4 = 7/12, and so on. One 
immediate observation is that the successive sums oscillate in a progressively 
narrower space. The odd sums decrease (si > S 3 > S 5 > . . .) while the even 
sums increase (s 2 < S 4 < Sq < . . .). 



0 


S &.6 9 


S 2 54-56 


55 53 


5 1 


1 


^2 < 54 < Sq < • • • S • * * < S 5 < S 3 < Si 

It seems reasonable — and we will soon prove — that the sequence (s n ) eventu- 
ally hones in on a value, call it 5, where the odd and even partial sums “meet.” 
At this moment, we cannot compute S precisely, but we know it falls somewhere 
between 7/12 and 5/6. Summing a few hundred terms reveals that S ~ .69. 
Whatever its value, there is now an overwhelming temptation to write 



S = 1 



1 1 

— + — 
4 5 


1 1 

6 + 7 
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meaning, perhaps, that if we could indeed add up all infinitely many of these 
numbers, then the sum would equal S. A more familiar example of an equation 
of this type might be 


2 = 1 + 



1 

4 


+ 



1 1 1 

— + — + — + • • • 
16 32 64 




the only difference being that in the second equation we have a more recognizable 
value for the sum. 

But now for the crux of the matter. The symbols +, — , and = in the preced- 
ing equations are deceptively familiar notions being used in a very unfamiliar 
way. The crucial question is whether or not properties of addition and equality 
that are well understood for finite sums remain valid when applied to infinite ob- 
jects such as equation (1). The answer, as we are about to witness, is somewhat 
ambiguous. 

Treating equation (1) in a standard algebraic way, let’s multiply through by 
1/2 and add it back to equation (1): 

1 q— l _l , l _l I J_ _ J_ I 

2 ^ 2 4 ' 6 8 " r 10 12 ^ 


+ 5=1 


I + i 

2^3 


i + 1 

4^5 


i + 1 

6^7 


1 + 1 

8^9 


2- + J, 

10 ' 11 


2 - + 2 , 

12 ' 13 









Now, look carefully at the result. The sum in equation (2) consists precisely 
of the same terms as those in the original equation (1), only in a different order. 
Specifically, the series in (2) is a rearrangement of (1) where we list the first 
two positive terms (1 + |) followed by the first negative term (— ^), followed 
by the next two positive terms (| + y) and then the next negative term ( — |). 
Continuing this, it is apparent that every term in (2) appears in (1) and vice 
versa. The rub comes when we realize that equation (2) asserts that the sum of 
these rearranged, but otherwise unaltered, numbers is equal to 3/2 its original 
value. Indeed, adding a few hundred terms of equation (2) produces partial 
sums in the neighborhood of 1.03. Addition, in this infinite setting, is not 
commutative! 

Let’s look at a similar rearrangement of the series 


E(-v 2 ) n - 

n = 0 


This series is geometric with first term 1 and common ratio r = —1/2. Using 
the formula 1/(1 — r) for the sum of a geometric series (Example 2.7.5), we get 

i 1 , 11 , 1 1 , 1 1,1 _ 1 _ 2 

_ 2 + 4 _ 8 + 16 _ 32 + 64 _ 128 + 256'" _ 1 - (-±) “ 3' 

This time, some computational experimentation with the “two positives, one 
negative” rearrangement 

111111 1 1 

1 + — — — + — + — — — + + — — • • • 

4 2 16 64 8 256 1024 32 
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yields partial sums quite close to 2/3. The sum of the first 30 terms, for instance, 
equals .666667. Infinite addition is commutative in some instances but not in 
others. 

Far from being a charming theoretical oddity of infinite series, this phe- 
nomenon can be the source of great consternation in many applied situations. 
How, for instance, should a double summation over two index variables be de- 
fined? Let’s say we are given a grid of real numbers {aij : i, j E N}, where 
dij = 1/2- 7- * if j > i, = —1 if j = i, and = 0 if j < i. 


-I l l l l 

1 2 4 8 16 

0-1 4 4 4 
0 0-1 - 4 

u u ± 2 4 

0 0 0 -1 \ ••• 

0 0 0 0 -1 ••• 


We would like to attach a mathematical meaning to the summation 


E 



ij= 1 


whereby we intend to include every term in the preceding array in the total. 
One natural idea is to temporarily fix i and sum across each row. A moment’s 
reflection (and a fact about geometric series) shows that each row sums to 0. 
Summing the sums of the rows, we get 


oo 



hj= 1 



oo 


E (°) = °- 


i — 1 


We could just as easily have decided to fix j and sum down each column first. 
In this case, we have 



Changing the order of the summation changes the value of the sum! One com- 
mon way that double sums arise (although not this particular one) is from the 
multiplication of two series. There is a natural desire to write 


(£«.) (£".) = £ 


bj , 


except that the expression on the right-hand side makes no sense at the moment. 
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It is the pathologies that give rise to the need for rigor. A satisfying resolu- 
tion to the questions raised will require that we be absolutely precise about what 
we mean as we manipulate these infinite objects. It may seem that progress is 
slow at first, but that is because we do not want to fall into the trap of letting 
the biases of our intuition corrupt our arguments. Rigorous proofs are meant 
to be a check on intuition, and in the end we will see that they vastly improve 
our mental picture of the mathematical infinite. 

As a final example, consider something as intuitively fundamental as the 
associative property of addition applied to the series l) n - Grouping 

the terms one way gives 

(-1 + 1 ) + (-1 + 1 ) + (-1 + 1 ) + (-1 + 1 ) + • • • = 0 + 0 + 0 + o + • • • = o , 

whereas grouping in another yields 

-1 + (1 - 1) + (1 - 1) + (1 - 1) + • • • = -1 + 0 + 0 + 0 + • • • = -1. 

Manipulations that are legitimate in finite settings do not always extend to 
infinite settings. Deciding when they do and why they do not is one of the 
central themes of analysis. 


2.2 The Limit of a Sequence 


An understanding of infinite series depends heavily on a clear understanding of 
the theory of sequences. In fact, most of the concepts in analysis can be reduced 
to statements about the behavior of sequences. Thus, we will spend a significant 
amount of time investigating sequences before taking on infinite series. 

Definition 2.2.1. A sequence is a function whose domain is N. 


This formal definition leads immediately to the familiar depiction of a se- 
quence as an ordered list of real numbers. Given a function / : N — > R, /(n) is 
just the nth term on the list. The notation for sequences reinforces this familiar 
understanding. 


Example 2.2.2. Each of the following are common ways to describe a sequence. 

W MTE--), 

(ii) = (!,§, !>•••), 

(iii) (a n ), where a n = 2 n for each n E N, 

(iv) (x n ), where x\ — 2 and x n +i = Xn f rl . 


On occasion, it will be more convenient to index a sequence beginning with 
n = 0 or n = no for some natural number no different from 1. These minor 
variations should cause no confusion. What is essential is that a sequence be an 
infinite list of real numbers. What happens at the beginning of such a list is of 
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little importance in most cases. The business of analysis is concerned with the 
behavior of the infinite “tail” of a given sequence. 

We now present what is arguably the most important definition in the book. 


Definition 2.2.3 (Convergence of a Sequence). A sequence (a n ) converges 
to a real number a if, for every positive number e, there exists an TV £ N such 
that whenever n > N it follows that \a n — a < e. 


To indicate that (a n ) converges to a, we usually write either lima n = a or 
(a n ) —> a. The notation lim n ^ 00 a n = a is also standard. 

In an effort to decipher this complicated definition, it helps first to consider 
the ending phrase “\a n — a\ < e,” and think about the points that satisfy an 
inequality of this type. 


Definition 2.2.4. Given a real number a £ R and a positive number e > 0, 
the set 


V e (a) = {x £ R : 


x 


a 


< 4 


is called the e-neighborhood of a. 


Notice that V e (a) consists of all of those points whose distance from a is less 
than e. Said another way, V e (a) is an interval, centered at a, with radius e. 


V e (a) 


-4 ♦ £ 

CL — 6 CL CL T 6 


Recasting the definition of convergence in terms of e-neighborhoods gives a 
more geometric impression of what is being described. 

Definition 2.2.3B (Convergence of a Sequence: Topological Version). 

A sequence (a n ) converges to a if, given any e-neighborhood V e (a) of a, there 
exists a point in the sequence after which all of the terms are in V e (a). In other 
words, every e-neighborhood contains all but a finite number of the terms of 

ifln) • 


CL 1 


V t (a) 



CL — 6 CL CL~\~C 


Definition 2.2.3 and Definition 2.2.3B say precisely the same thing; the nat- 
ural number N in the original version of the definition is the point where the 
sequence (a n ) enters V e (a ), never to leave. It should be apparent that the value 
of N depends on the choice of e. The smaller the e-neighborhood, the larger N 
may have to be. 
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Example 2.2.5. Consider the sequence (a n ), where a n = 1/y/n. 

Our intuitive understanding of limits points confidently to the conclusion 
that 



Before trying to prove this not too impressive fact, let’s first explore the rela- 
tionship between e and N in the definition of convergence. For the moment, take 
e to be 1/10. This defines a sort of “target zone” for the terms in the sequence. 
By claiming that the limit of (a n ) is 0, we are saying that the terms in this 
sequence eventually get arbitrarily close to 0. How close? What do we mean 
by “eventually”? We have set e = 1/10 as our standard for closeness, which 
leads to the e-neighborhood (—1/10,1/10) centered around the limit 0. How 
far out into the sequence must we look before the terms fall into this interval? 
The 100th term aioo = 1/10 puts us right on the boundary, and a little thought 
reveals that 

(11 

if n > 100, then a„ E , — 

V 10 10 



Thus, for e = 1/10 we choose N = 101 (or anything larger) as our response. 

Now, our choice of e = 1/10 was rather whimsical, and we can do this again, 
letting e = 1/50. In this case, our target neighborhood shrinks to (—1/50, 1/50), 
and it is apparent that we must travel farther out into the sequence before a n 
falls into this interval. How far? Essentially, we require that 


1 1 

^Jn < 50 


9 

which occurs as long as n > 50 = 2500. 


Thus, N = 2501 is a suitable response to the challenge of e = 1/50. 

It may seem as though this duel could continue forever, with different e 
challenges being handed to us one after another, each one requiring a suitable 
value of N in response. In a sense, this is correct, except that the game is 
effectively over the instant we recognize a rule for how to choose N given an 
arbitrary e > 0. For this problem, the desired algorithm is implicit in the algebra 
carried out to compute the previous response of N = 2501. Whatever e happens 
to be, we want 

1 1 

—= < e which is equivalent to insisting that n > — . 

\ n e l 


With this observation, we are ready to write the formal argument. 
We claim that 



Proof. Let e > 0 be an arbitrary positive number. Choose a natural number N 
satisfying 

1 


N > 
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We now verify that this choice of N has the desired property. Let n > N. Then, 


1 1 

n > — implies < e, and hence a n — 0 < e. 


Quantifiers 

The definition of convergence given earlier is the result of hundreds of years of 
refining the intuitive notion of limit into a mathematically rigorous statement. 
The logic involved is complicated and is intimately tied to the use of the quan- 
tifiers “for all” and “there exists.” Learning to write a grammatically correct 
convergence proof goes hand in hand with a deep understanding of why the 
quantifiers appear in the order that they do. 

The definition begins with the phrase, 

u For all e > 0, there exists N £ N such that ...” 

Looking back at our first example, we see that our formal proof begins with, “Let 
e > 0 be an arbitrary positive number.” This is followed by a construction of N 
and then a demonstration that this choice of N has the desired property. This, 
in fact, is a basic outline for how every convergence proof should be presented. 

Template for a proof that (x n ) -a x : 

- “Let e > 0 be arbitrary.” 

- Demonstrate a choice for N E N. This step usually requires the most 
work, almost all of which is done prior to actually writing the formal 
proof. 

- Now, show that N actually works. 

- “Assume n > NT 


- With N well chosen, it should be possible to derive the inequality 
\x n — x < e. 


Example 2.2.6. Show 



As mentioned, before attempting a formal proof, we first need to do some 
preliminary scratch work. In the first example, we experimented by assigning 
specific values to e (and it is not a bad idea to do this again), but let us skip 
straight to the algebraic punch line. The last line of our proof should be that 
for suitably large values of n, 


n + \ 

1 < e. 

n 
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Because 


n + 1 


- 1 


n 


1 


n 


this is equivalent to the inequality 1/n < e orn > 1/e. Thus, choosing N to be 
an integer greater than 1/e will suffice. 

With the work of the proof done, all that remains is the formal writeup. 

Proof. Let e > 0 be arbitrary. Choose N E N with N > 1/e. To verify that 
this choice of N is appropriate, let n G N satisfy n > N. Then, n > N implies 
n > 1/e, which is the same as saying 1 jn < e. Finally, this means 


n + 1 


- 1 


n 


< e. 


as desired. □ 

It is instructive to see what goes wrong in the previous example if we try to 
prove that our sequence converges to some limit other than 1. 

Theorem 2.2.7 (Uniqueness of Limits). The limit of a sequence, when it 
exists, must be unique. 

Proof. Exercise 2.2.6. □ 

Divergence 

Significant insight into the role of the quantifiers in the definition of convergence 
can be gained by studying an example of a sequence that does not have a limit. 

Example 2.2.8. Consider the sequence 

f 1111111111111 

y 1, 2’ 3’ 4’ 5’ 5’ 5’ 5’ 5’ 5’ 5’ 5’ 5’ 5’ 

How can we argue that this sequence does not converge to zero? Looking at the 
first few terms, it seems the initial evidence actually supports such a conclusion. 
Given a challenge of e = 1/2, a little reflection reveals that after N = 3 all the 
terms fall into the neighborhood (—1/2, 1/2). We could also handle e = 1/4. 
(What is the smallest possible N in this case?) 

But the definition of convergence says “ For all e > 0. . . ,” and it should be 
apparent that there is no response to a choice of e = 1/10, for instance. This 
leads us to an important observation about the logical negation of the definition 
of convergence of a sequence. To prove that a particular number x is not the 
limit of a sequence (x n ), we must produce a single value of e for which no TV £ N 
works. More generally speaking, the negation of a statement that begins “For all 
P, there exists Q. . . ” is the statement, “For at least one P, no Q is possible. . . ” 
For instance, how could we disprove the spurious claim that “At every college 
in the United States, there is a student who is at least seven feet tall”? 
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We have argued that the preceding sequence does not converge to 0. Let’s 
argue against the claim that it converges to 1/5. Choosing e = 1/10 produces 
the neighborhood (1/10,3/10). Although the sequence continually revisits this 
neighborhood, there is no point at which it enters and never leaves as the defini- 
tion requires. Thus, no N exists for e = 1/10, so the sequence does not converge 
to 1/5. 

Of course, this sequence does not converge to any other real number, and it 
would be more satisfying to simply say that this sequence does not converge. 

Definition 2.2.9. A sequence that does not converge is said to diverge. 

Although it is not too difficult, we will postpone arguing for divergence in general 
until we develop a more economical divergence criterion later in Section 2.5. 


Exercises 


Exercise 2.2.1. What happens if we reverse the order of the quantifiers in 
Definition 2.2.3? 

Definition: A sequence (x n ) verconges to x if there exists an e > 0 such that 
for all N e N it is true that n > N implies \x n — x\ < e. 

Give an example of a vercongent sequence. Is there an example of a ver- 
congent sequence that is divergent? Can a sequence verconge to two different 
values? What exactly is being described in this strange definition? 


Exercise 2.2.2. Verify, using the definition of convergence of a sequence, that 
the following sequences converge to the proposed limit. 

( a ) lim §S+3 = !• 

(b) lim = 0. 

(c) lim = 0. 

Exercise 2.2.3. Describe what we would have to demonstrate in order to dis- 
prove each of the following statements. 

(a) At every college in the United States, there is a student who is at least 
seven feet tall. 

(b) For all colleges in the United States, there exists a professor who gives 
every student a grade of either A or B. 

(c) There exists a college in the United States where every student is at least 
six feet tall. 


Exercise 2.2.4. Give an example of each or state that the request is impossible. 
For any that are impossible, give a compelling argument for why that is the case. 

(a) A sequence with an infinite number of ones that does not converge to one. 
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(b) A sequence with an infinite number of ones that converges to a limit not 
equal to one. 

(c) A divergent sequence such that for every n E N it is possible to find n 
consecutive ones somewhere in the sequence. 


Exercise 2.2.5. Let [[#]] be the greatest integer less than or equal to x. For 
example, [[tt}\ = 3 and [[3]] = 3. For each sequence, find lima n and verify it 
with the definition of convergence. 


(a) a n = [[5/ 77.]] , 

(b) a n = [[(12 + 4n)/3n 


Reflecting on these examples, comment on the statement following 
Definition 2.2.3 that “the smaller the e-neighborhood, the larger TV may have 
to be.” 


Exercise 2.2.6. Prove Theorem 2.2.7. To get started, assume (a n ) a and 
also that (a n ) —> b. Now argue a = b. 

Exercise 2.2.7. Here are two useful definitions: 


(i) A sequence (a n ) is eventually in a set A C R if there exists an TV E N 
such that a n E A for all n > TV. 

(ii) A sequence (a n ) is frequently in a set A C R if, for every TV E N, there 
exists an n > TV such that a n E A. 

(a) Is the sequence (— l) n eventually or frequently in the set {i}? 

(b) Which definition is stronger? Does frequently imply eventually or 
does eventually imply frequently? 

(c) Give an alternate rephrasing of Definition 2.2.3B using either fre- 
quently or eventually. Which is the term we want? 

(d) Suppose an infinite number of terms of a sequence (x n ) are equal 
to 2. Is (x n ) necessarily eventually in the interval (1.9, 2.1)? Is it 
frequently in (1.9, 2.1)? 

Exercise 2.2.8. For some additional practice with nested quantifiers, consider 
the following invented definition: 

Let’s call a sequence (x n ) zero-heavy if there exists Me N such that for all 
TV E N there exists n satisfying TV < n < TV + M where x n = 0. 

(a) Is the sequence (0, 1, 0, 1, 0, 1, . . .) zero heavy? 

(b) If a sequence is zero-heavy does it necessarily contain an infinite number 
of zeros? If not, provide a counterexample. 

(c) If a sequence contains an infinite number of zeros, is it necessarily zero- 
heavy? If not, provide a counterexample. 

(d) Form the logical negation of the above definition. That is, complete the 
sentence: A sequence is not zero- heavy if ... . 
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2.3 The Algebraic and Order Limit 
Theorems 


The real purpose of creating a rigorous definition for convergence of a sequence is 
not to have a tool to verify computational statements such as lim 2n/(n + 2) = 2. 
Historically, a definition of the limit like Definition 2.2.3 came 150 years after the 
founders of calculus began working with intuitive notions of convergence. The 
point of having such a logically tight description of convergence is so that we 
can confidently prove statements about convergent sequences in general. We are 
ultimately trying to resolve arguments about what is and is not true regarding 
the behavior of limits with respect to the mathematical manipulations we intend 
to inflict on them. 

As a first example, let us prove that convergent sequences are bounded. The 
term “bounded” has a rather familiar connotation but, like everything else, we 
need to be explicit about what it means in this context. 


Definition 2.3.1. A sequence (x n ) is bounded if there exists a number M > 0 


such that 


x 


n 


< M for all n £ N. 


Geometrically, this means that we can find an interval [— M, M] that contains 
every term in the sequence (x n ). 


Theorem 2.3.2. Every convergent sequence is bounded. 


Proof. Assume (x n ) converges to a limit l. This means that given a particular 
value of e, say e = 1, we know there must exist an TV £ N such that if n > TV, 
then x n is in the interval (/ — 1,/ + 1). Not knowing whether l is positive or 
negative, we can certainly conclude that 


x 


n 


<i+i 


for all n > N. 


x n , n>N 


X2 


X\ 


x 3 


X5 X4 


0 


/-I l Z+l 


t 

M 


We still need to worry (slightly) about the terms in the sequence that come 
before the TVth term. Because there are only a finite number of these, we let 

M = max{|aq|, \x 2 \, \x 3\1 • • • , |aqv-i|, |^| + 1}. 


It follows that x n < M for all n £ N, as desired. 


□ 


This chapter began with a demonstration of how applying familiar algebraic 
properties (commutativity of addition) to infinite objects (series) can lead to 
paradoxical results. These examples are meant to instill in us a sense of caution 
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and justify the extreme care we are taking in drawing our conclusions. The 
following theorems illustrate that sequences behave extremely well with respect 
to the operations of addition, multiplication, division, and order. 

Theorem 2.3.3 (Algebraic Limit Theorem). Let lima n = a, and lim6 n = 

b. Then, 

(i) lim(ca n ) = ca, for all c E R; 

(ii) lim(a n + b n ) = a + b; 

(iii) lim(a n 6 n ) = ab; 

(iv) lim(a n /b n ) = a /b, provided b ^ 0. 

Proof, (i) Consider the case where c ^ 0. We want to show that the sequence 
(■ ca n ) converges to ca , so the structure of the proof follows the template we 
described in Section 2.2. First, we let e be some arbitrary positive number. Our 
goal is to find some point in the sequence ( ca n ) after which we have 


ca n — ca 


< e. 


Now, 


ca 


n 


ca 


c 


a 


n 


a 


We are given that (a n ) a, so we know we can make 
like. In particular, we can choose an N such that 


a 


n 


a 


as small as we 


a n — a 


< 


c 


whenever n > N. To see that this N indeed works, observe that, for all n > TV, 


ca 


n 


ca 


c 


a 


n 


a 


< 


c 


c 


e. 


The case c = 0 reduces to showing that the constant sequence (0, 0, 0, . . .) con- 
verges to 0, which is easily verified. 

Before continuing with parts (ii), (iii), and (iv), we should point out that 
the proof of (i), while somewhat short, is extremely typical for a convergence 
proof. Before embarking on a formal argument, it is a good idea to take an 
inventory of what we want to make less than e, and what we are given can be 
made small for suitable choices of n. For the previous proof, we wanted to make 


ca n — ca 


< T and we were given 


a n — a 


< anything we like (for large values 


of n). Notice that in (i), and all of the ensuing arguments, the strategy each 
time is to bound the quantity we want to be less than e, which in each case is 


(terms of sequence) — (proposed limit) 


with some algebraic combination of quantities over which we have control. 
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(ii) To prove this statement, we need to argue that the quantity 

| (a n + b n ) — (a + b) \ 


can be made less than an arbitrary e using the assumptions that | a n — a | and 
\b n — b\ can be made as small as we like for large n. The first step is to use the 
triangle inequality (Example 1.2.5) to say 


(a n + b n ) — (a + 6)| — |(a n — a) + (6 n — 6)| < 


a 


n 


a 


+ b 


'n 


Again, we let e > 0 be arbitrary. The technique this time is to divide the e 
between the two expressions on the right-hand side in the preceding inequality. 
Using the hypothesis that (a n ) a, we know there exists an Ni such that 


a 


n 


a 


< 


whenever n > Nl 


Likewise, the assumption that (b n ) 


that 


'n 


b < 


e 

2 


— b means that we can choose an N 2 so 
whenever n > N 2 . 


The question now arises as to which of Ni or N 2 we should take to be our 
choice of N. By choosing N = max{A/i, we ensure that if n > TV, then 
n > Ni and n > N 2 . This allows us to conclude that 


(a n + bn) — (a + 6) | < 

< 



— a 
e 


+ b 


’n 



b 


for all n > TV, as desired. 


(iii) To show that ( a n b n ) ab , we begin by observing that 


a n b n — ab 


= a n b n — ab n + ab n — ab 
< a n b n — ab n + ab n — ab 


'n 


a 


n 


a 


+ 


a 


’n 


In the initial step, we subtracted and then added a6 n , which created an oppor- 
tunity to use the triangle inequality. Essentially, we have broken up the distance 
from a n b n to ab with a midway point and are using the sum of the two distances 
to overestimate the original distance. This clever trick will become a familiar 
technique in arguments to come. 

Letting e > 0 be arbitrary, we again proceed with the strategy of making each 
piece in the preceding inequality less than e/2. For the piece on the right-hand 
side (|a||6 n — 6|), if a ^ 0 we can choose N\ so that 


n > N\ implies \b n — b < 


1 e 


a 
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(The case when a = 0 is handled in Exercise 2.3.9.) Getting the term on the 
left-hand side (|6 n ||a n — a\) to be less than e/2 is complicated by the fact that 
we have a variable quantity \b n \ to contend with as opposed to the constant 


a 


we encountered in the right-hand term. The idea is to replace \b n with 
a worst-case estimate. Using the fact that convergent sequences are bounded 
(Theorem 2.3.2), we know there exists a bound M > 0 satisfying \b n \ < M for 
all n G N. Now, we can choose so that 


a 


n 


a 


1 e Ar 

< — - whenever n > No. 
M2 “ 


To finish the argument, pick N = max{Afi, A^}, and observe that if n > N, 
then 


a n b n — ab 


CL n bn & b n T 


< 

< 

< 

(iv) This final statement will follow from (iii) if we can prove that 



(bn) b implies 


1 


'n 


1 

b 


whenever 6^0. We begin by observing that 


1 


'n 


l 

b 


b-b 


n 


b b 


’n 


Because (b n ) 6, we can make the preceding numerator as small as we like by 
choosing n large. The problem comes in that we need a worst-case estimate on 
the size of l/(|6||6 n |). Because the b n terms are in the denominator, we are no 
longer interested in an upper bound on \b n \ but rather in an inequality of the 
form \b n | > S > 0. This will then lead to a bound on the size of l/(|b||b n |). 

The trick is to look far enough out into the sequence (b n ) so that the terms 
are closer to b than they are to 0. Consider the particular value eo = \b\/2. 
Because (b n ) 6, there exists an Ni such that | b n — b\ < \b\/2 for all n > Afi. 
This implies \b n \ > \b\/2. 

Next, choose N 2 so that n > N 2 implies 


'n 


b < 


Finally, if we let N = infix} Ay . A^}, then n > N implies 

1 1 1 „ e|6| 2 1 

K~b~ 


□ 
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Limits and Order 

Although there are a few dangers to avoid (see Exercise 2.3.7), the Algebraic 
Limit Theorem verifies that the relationship between algebraic combinations of 
sequences and the limiting process is as trouble-free as we could hope for. Limits 
can be computed from the individual component sequences provided that each 
component limit exists. The limiting process is also well-behaved with respect 
to the order operation. 


Theorem 2.3.4 (Order Limit Theorem). Assume \ima n = a and\rmb n = b. 

(i) If a n > 0 for all n E N, then a > 0. 

(ii) If a n < b n for all n E N, then a <b. 

(iii) If there exists cgR for which c <b n for all n E N, then c < b. Similarly, 
if an < c for all n E N, then a < c. 


Proof, (i) We will prove this by contradiction; thus, let’s assume a < 0. The 
idea is to produce a term in the sequence (a n ) that is also less than zero. To 


do this, we consider the particular value e = 
guarantees that we can find an N such that 


a \. 

a n 


The definition of convergence 
— a < \a\ for all n > N. In 


particular, this would mean that \ajy — a\ < |a|, which implies a at < 0. This 
contradicts our hypothesis that a^v > 0. We therefore conclude that a > 0. 


ClN 


\ 


• CL 2 CL 1 


CL — €q 


a 


0 — CL~\~€ o 


(ii) The Algebraic Limit Theorem ensures that the sequence (b n — a n ) con- 
verges to b — a. Because b n — a n > 0, we can apply part (i) to get that b — a > 0. 

(iii) Take a n = c (or b n = c) for all n E N, and apply (ii). □ 

A word about the idea of “tails” is in order. Loosely speaking, limits and 
their properties do not depend at all on what happens at the beginning of 
the sequence but are strictly determined by what happens when n gets large. 
Changing the value of the first ten — or ten thousand — terms in a particular 
sequence has no effect on the limit. Theorem 2.3.4, part (i), for instance, assumes 
that a n > 0 for all n E N. However, the hypothesis could be weakened by 
assuming only that there exists some point Ah where a n > 0 for all n > Ah- 
The theorem remains true, and in fact the same proof is valid with the provision 
that when N is chosen it be at least as large as Ah. 

In the language of analysis, when a property (such as non-negativity) is not 
necessarily possessed by some finite number of initial terms but is possessed 
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by all terms in the sequence after some point TV, we say that the sequence 
eventually has this property. (See Exercise 2 . 2 . 7 .) Theorem 2 . 3 . 4 , part (i), could 
be restated, “Convergent sequences that are eventually nonnegative converge to 
nonnegative limits.” Parts (ii) and (iii) have similar modifications, as will many 
other upcoming results. 

Exercises 

Exercise 2.3.1. Let x n > 0 for all n e N. 

(a) If (x n ) 0, show that (s/x^) —> 0. 

(b) If (x n ) —> x , show that (yUU) —> ^Jx. 

Exercise 2.3.2. Using only Definition 2 . 2 . 3 , prove that if (x n ) — >• 2 , then 

(a) (2^=i) -> 1; 

(b) (l/x n ) — > 1/2. 

(For this exercise the Algebraic Limit Theorem is off-limits, so to speak.) 

Exercise 2.3.3 (Squeeze Theorem). Show that if x n < y n < z n for all 
n G N, and if limx n = lim z n = Z, then lim y n = l as well. 

Exercise 2.3.4. Let (a n ) 0, and use the Algebraic Limit Theorem to com- 
pute each of the following limits (assuming the fractions are always defined): 

( a ) lim ( i+U-m ) 

(b) lim ( (a "+ 2 J 2 ~ 4 ) 

(c) limf-g^Y 

Exercise 2.3.5. Let (x n ) and (y n ) be given, and define (z n ) to be the “shuffled” 
sequence (aq, 2 / 1 , X 2 , y 2 , ^ 3 ? 2 / 3 , • • • , x n ^ ym • • •)• P rove that (z n ) is convergent if 
and only if (x n ) and (y n ) are both convergent with \imx n = lim y n . 

Exercise 2.3.6. Consider the sequence given by b n = n — \fn 2 - \- 2 n. Taking 
(1/n) 0 as given, and using both the Algebraic Limit Theorem and the result 

in Exercise 2 . 3 . 1 , show lim6 n exists and find the value of the limit. 

Exercise 2.3.7. Give an example of each of the following, or state that such a 
request is impossible by referencing the proper theorem(s): 

(a) sequences (x n ) and (y n ), which both diverge, but whose sum (x n + y n ) 
converges; 

(b) sequences (x n ) and (y n ), where (x n ) converges, (y n ) diverges, and (x n +y n ) 
converges; 
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(c) a convergent sequence (b n ) with b n ^ 0 for all n such that (1 /b n ) diverges; 

(d) an unbounded sequence (a n ) and a convergent sequence (b n ) with ( a n — b n ) 
bounded; 

(e) two sequences (a n ) and (b n ), where ( a n b n ) and (a n ) converge but (b n ) 
does not. 


Exercise 2.3.8. Let (x n ) x and let p(x) be a polynomial. 

(a) Show p(x n ) —> p(x). 

(b) Find an example of a function f(x) and a convergent sequence (x n ) —> x 
where the sequence f(x n ) converges, but not to f(x). 

Exercise 2.3.9. (a) Let (a n ) be a bounded (not necessarily convergent) 

sequence, and assume lim6 n = 0. Show that lim(a n 6 n ) = 0. Why are 
we not allowed to use the Algebraic Limit Theorem to prove this? 

(b) Can we conclude anything about the convergence of ( a n b n ) if we assume 
that (b n ) converges to some nonzero limit 5? 

(c) Use (a) to prove Theorem 2.3.3, part (iii), for the case when a = 0. 

Exercise 2.3.10. Consider the following list of conjectures. Provide a short 
proof for those that are true and a counterexample for any that are false. 


(a) If lim(a n — b n ) = 0, then lima n = limb n . 


(b) If (b n ) —> 5, then | b n 


(c) If (a n ) 0 and (b n — a n ) 0, then (b n ) 


a. 


(d) If (a n ) 0 and \b n — b\ < a n for all n G N, then (b n ) b. 

Exercise 2.3.11 (Cesaro Means). (a) Show that if (x n ) is a convergent 
sequence, then the sequence given by the averages 


X\ T X 2 T • • • T x n 

Un = 

n 

also converges to the same limit. 

(b) Give an example to show that it is possible for the sequence (y n ) of aver- 
ages to converge even if (x n ) does not. 

Exercise 2.3.12. A typical task in analysis is to decipher whether a property 
possessed by every term in a convergent sequence is necessarily inherited by 
the limit. Assume (a n ) a, and determine the validity of each claim. Try to 
produce a counterexample for any that are false. 


(a) If every a n is an upper bound for a set B , then a is also an upper bound 
for B. 
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(b) If every a n is in the complement of the interval (0, 1), then a is also in the 
complement of (0, 1). 

(c) If every a n is rational, then a is rational. 

Exercise 2.3.13 (Iterated Limits). Given a doubly indexed array a mn where 
ra, n E N, what should lim m?n ^ 00 a mn represent? 

(a) Let a mn = 777/(777 + 77) and compute the iterated limits 

lim ( lim a mn ) and lim ( lim a mn ) . 

n— >00 \m-> 00 / m— >oo \n— »- oo / 


Define lim m)n ^ 00 a mn = a to mean that for all e > 0 there exists an TV E N 


such that if both 777, 77 > AT, then 


a 


mn 


— a 


< e. 


(b) Let a mn = 1/(777 + 77). Does lim m)n ^ 00 a mn exist in this case? Do the two 
iterated limits exist? How do these three values compare? Answer these 
same questions for a mn = mn/(m 2 + 77 2 ). 

(c) Produce an example where lim mjn ^ 00 a mn exists but where neither iter- 
ated limit can be computed. 


(d) Assume lim m5n ^ 00 a mn = a, and assume that for each fixed m E N, 

1 i Ul n qc, ( (7 m n ) t Gn • SllOW hm^^cx) (7. 

(e) Prove that if lim m;n ^ 00 a mn exists and the iterated limits both exist, then 
all three limits must be equal. 


2.4 The Monotone Convergence Theorem 
and a First Look at Infinite Series 

We showed in Theorem 2.3.2 that convergent sequences are bounded. The 
converse statement is certainly not true. It is not too difficult to produce an 
example of a bounded sequence that does not converge. On the other hand, if 
a bounded sequence is monotone , then in fact it does converge. 

Definition 2.4.1. A sequence (a n ) is increasing if a n < a n+ i for all n E N and 
decreasing if a n > a n + 1 for all n E N. A sequence is monotone if it is either 
increasing or decreasing. 

Theorem 2.4.2 (Monotone Convergence Theorem). If a sequence is mono- 
tone and bounded, then it converges. 

Proof. Let (a n ) be monotone and bounded. To prove (a n ) converges using the 
definition of convergence, we are going to need a candidate for the limit. Let’s 
assume the sequence is increasing (the decreasing case is handled similarly), and 
consider the set of points {a n : n E N}. By assumption, this set is bounded, so 
we can let 

s = sup{a n : 77 E N}. 

It seems reasonable to claim that lima n = s. 
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◄ 


<2jV 



CL l CL 2 0-3 


sup{a n :n(EN} 
} 


s — c 


To prove this, let e 0. Because s is the least ripper bound for {d n i n £ N}, 
s — e is not an upper bound, so there exists a point in the sequence a n such 
that s — e < a jy . Now, the fact that (a n ) is increasing implies that if n > TV, 
then ajy- < a n . Hence, 


s — e < ajy < ct n < s < s + e, 


which implies | a n 


s 


< e, as desired. 


□ 


The Monotone Convergence Theorem is extremely useful for the study of 
infinite series, largely because it asserts the convergence of a sequence without 
explicit mention of the actual limit. This is a good moment to do some prelimi- 
nary investigations, so it is time to formalize the relationship between sequences 
and series. 

Definition 2.4.3 (Convergence of a Series). Let (b n ) be a sequence. An 
infinite series is a formal expression of the form 


oo 

^ ] frn = frl T T T T T • • • * 

n = 1 

We define the corresponding sequence of partial sums ( s m ) by 


Sm — + 62 + 5 3 + ' ' ' + brm 


and say that the series b n converges to B if the sequence (s m ) converges 

to B. In this case, we write En=l bn = B. 


Example 2.4.4. Consider 


E 


1 


n 


2 ‘ 


Because the terms in the sum are all positive, the sequence of partial sums 
given by 


1 1 


1 
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is increasing. The question is whether or not we can find some upper bound on 
(s m ). To this end, observe 


’m 


111 1 

1 + ^-r + — — + —— + ••• + — 


2-2 3-3 4-4 

1 1 1 


m j 


1 


2-1 3-2 4-3 


m(m — 1) 


1 


1 1 


1 1 


= l+ l-v + v-v + v-T + •■• + 


1 


1 


4 


(m — 1) m 


1 

— 1 + 1 — — 

m 

< 2 . 


Thus, 2 is an upper bound for the sequence of partial sums, so by the Mono- 
tone Convergence Theorem, l/ 77,2 converges to some (for the moment) 

unknown limit less than 2. (Finding the value of this limit is the subject of 
Sections 6.1 and 8.3.) 

Example 2.4.5 (Harmonic Series). This time, consider the so-called har- 
monic series 


oo 

E 

ri—1 


1 


n 


Again, we have an increasing sequence of partial sums, 


1 1 


1 


s m ~ ^ + o + q + ‘ ‘ H > 

2 3 m 

that upon naive inspection appears as though it may be bounded. However, 2 
is no longer an upper bound because 

1 (1 1 \ 1 (1 1 \ 

£>4 — 1-f- — -f- ( — T — ] ^>l“f- — -}-( — + — ) — 2. 

2 V 3 4 J 2 \44j 

A similar calculation shows that sg > 2|, and we can see that in general 


$2 k 


1 (1 1 \ (1 1 

1+ 2 + 3 + i + 5 + - + = | + - + 


l 


l 


l l 


l 


8 

1 


2 fe - 1 + 1 


+ ••• + 


1 

2k 


> l + -+ T + T + V + '" + V + 


4 4 


8 


8 


+ 


1 

2 ^ 


+ ••• + 


1 

2k 


i + i + 2 (i) + 4 fii + 

111 1 

^ J rk[ - ) , 


+ 2 


k- 1 


1 

2k 


which is unbounded. Thus, despite the incredibly slow pace, the sequence of 
partial sums of V n eventually surpasses every number on the positive real 

line. Because convergent sequences are bounded, the harmonic series diverges. 
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The previous example is a special case of a general argument that can be 
used to determine the convergence or divergence of a large class of infinite series. 

Theorem 2.4.6 (Cauchy Condensation Test). Suppose (b n ) is decreasing 
and satisfies b n > 0 for all n E N. Then , the series b n converges if and 

only if the series 


oo 

^ ^ 2 n b 2 n = b\ + 262 T 4^4 T- 8&8 T 165i6 T • • • 

n — 0 


converges. 

Proof. First, assume that 2 n b 2 n converges. Theorem 2.3.2 guarantees 

that the partial sums 


t k — b\ + 2^2 T 4^4 + • • • + 2^&2 


are bounded; that is, there exists an M > 0 such that tk < M for all k E N. 
We want to prove that b n converges. Because b n > 0, we know that the 

partial sums are increasing, so we only need to show that 


’m 


— b\ + 62 + 63 + • • • + b 


m 


is bounded. 

Fix m and let k be large enough to ensure m < 2 fc+1 — 1. Then, < s 2 k+ i_i 
and 

^2^+1 _i = 61 + (b 2 + 63) + (64 T ^5 + &6 T 67) + • • • + ( b 2 k + • • • + 62^+1 _i ) 

^ 5 i T ( b 2 + 52) + (&4 + 64 + 64 + ^4) + • • • + ( b 2 k + • • • + b 2 k^j 

= 61 + 2 b 2 + 4^4 + • • • + 2 ^b 2 k = tk- 

Thus, s m < tk < M, and the sequence (s m ) is bounded. By the Monotone 
Convergence Theorem, we can conclude that b n converges. 

The proof that 2 n b 2 n diverges implies bn diverges is similar to 


m =0 Z^ n = 1 

Example 2.4.5. The details are requested in Exercise 2.4.9. 

Corollary 2.4.7. The series 1/ nP converges if and only if p > 1. 


□ 


A rigorous argument for this corollary requires a few basic facts about geo- 
metric series. The proof is requested in Exercise 2.7.5 at the end of Section 2.7 
where geometric series are discussed. 


Exercises 

Exercise 2.4.1. (a) Prove that the sequence defined by x\ — 3 and 


1 


*Tn+ 1 


4 x n 


converges. 
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(b) Now that we know limx n exists, explain why limx n+ i must also exist and 
equal the same value. 

(c) Take the limit of each side of the recursive equation in part (a) to explicitly 
compute limx n . 

Exercise 2.4.2. (a) Consider the recursively defined sequence yi = 1, 

Vn+l 3 Uni 

and set y = \miy n . Because ( y n ) and (y n + 1 ) have the same limit, taking 
the limit across the recursive equation gives y = 3 — y. Solving for y , we 
conclude lim y n = 3/2. 

What is wrong with this argument? 

(b) This time set yi = 1 and y n + i = 3 — Can the strategy in (a) be applied 

to compute the limit of this sequence? 

Exercise 2.4.3. (a) Show that 



converges and find the limit. 

(b) Does the sequence 

V2, \j 2\f2, ^2\]2\/2, . . . 

converge? If so, find the limit. 

Exercise 2.4.4. (a) In Section 1.4 we used the Axiom of Completeness (AoC) 

to prove the Archimedean Property of R (Theorem 1.4.2). Show that the 
Monotone Convergence Theorem can also be used to prove the Archimedean 
Property without making any use of AoC. 

(b) Use the Monotone Convergence Theorem to supply a proof for the Nested 
Interval Property (Theorem 1.4.1) that doesn’t make use of AoC. 

These two results suggest that we could have used the Monotone Con- 
vergence Theorem in place of AoC as our starting axiom for building a 
proper theory of the real numbers. 

Exercise 2.4.5 (Calculating Square Roots). Let x\ = 2, and define 


^n+1 


1 

2 




Show that x 2 n is always greater than or equal to 2, and then use this to 
prove that x n — x n+ \ > 0. Conclude that \rmx n = 
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(b) Modify the sequence (x n ) so that it converges to yfc. 

Exercise 2.4.6 (Arithmetic— Geometric Mean). (a) Explain why yfxy < 
(x + y )/ 2 for any two positive real numbers x and y. (The geometric mean 
is always less than the arithmetic mean.) 

(b) Now let 0 < x\ < yi and define 

f i IJ n 

%n-\-i v*^n?/n and y n -\- 1 ^ * 

Show limx n and lim y n both exist and are equal. 

Exercise 2.4.7 (Limit Superior). Let (a n ) be a bounded sequence. 

(a) Prove that the sequence defined by y n = supja/g : k > n} converges. 

(b) The limit superior of (a n ), or lim sup a n , is defined by 

lim sup a n = lim y n , 

where y n is the sequence from part (a) of this exercise. Provide a reason- 
able definition for lim inf a n and briefly explain why it always exists for 
any bounded sequence. 

(c) Prove that lim inf a n < lim sup a n for every bounded sequence, and give 
an example of a sequence for which the inequality is strict. 

(d) Show that lim inf a n = lim sup a n if and only if lima n exists. In this case, 
all three share the same value. 


Exercise 2.4.8. For each series, find an explicit formula for the sequence of 
partial sums and determine if the series converges. 


oo 


(»>£ 


1 


OO 


)n 


o>>r 


1 


oo 


n— 1 


— \ n(n + 1) 

n — 1 v 7 


(c) Yi lo S 


n - hi 


n—l 


n 


(In (c), log(x) refers to the natural logarithm function from calculus.) 

Exercise 2.4.9. Complete the proof of Theorem 2.4.6 by showing that if the 
series diverges, then so does &n- Example 2.4.5 may be a 

useful reference. 


Exercise 2.4.10 (Infinite Products). A close relative of infinite series is the 
infinite product 

oo 

Yl b n = ■ ■ ■ 

n—l 

which is understood in terms of its sequence of partial products 

m 

Pm | ^1^2 ^3 • 

n—l 
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Consider the special class of infinite products of the form 

oo 

JJ(1 + CL n ) = (1 + ai)(l + a 2 )( 1 + a 3 ) • • • , where a n > 0. 

n=l 

(a) Find an explicit formula for the sequence of partial products in the case 
where a n = 1 jn and decide whether the sequence converges. Write out 
the first few terms in the sequence of partial products in the case where 
a n = 1/n 2 and make a conjecture about the convergence of this sequence. 

(b) Show, in general, that the sequence of partial products converges if and 

only if a n converges. (The inequality 1 + x < 3 X for positive x will 

be useful in one direction.) 


2.5 Subsequences and the Bolzano— Weierstrass 
Theorem 


In Example 2.4.5, we showed that the sequence of partial sums (s m ) of the 
harmonic series does not converge by focusing our attention on a particular 
subsequence (s 2 k) of the original sequence. For the moment, we will put the 
topic of infinite series aside and more fully develop the important concept of 
subsequences. 

Definition 2.5.1. Let (a n ) be a sequence of real numbers, and let n\ < n 2 < 
ns < U 4 < n$ < . . . be an increasing sequence of natural numbers. Then the 
sequence 

(flni 5 5 ^ri3 5 ^714 1 ^ 77,5 ? • • •) 

is called a subsequence of (a n ) and is denoted by (a nfe ), where k G N indexes 
the subsequence. 


Notice that the order of the terms in a subsequence is the same as in the 
original sequence, and repetitions are not allowed. Thus if 


11111 

2’ 3’ 4’ 5’ 6 


(^n) A •> i-5^5 


then 


1111 

2’ 4’ 6’ 8’ 


and 


1 1 


1 


1 


10 ’ 100 ’ 1000 ’ 10000 


are examples of legitimate subsequences, whereas 


1111 


1 


1 


10’ 5’ 100’ 50’ 1000’ 500’ 


and 


1 , 1 , 


1111 

3’ 3’ 5’ 5’ 


are not. 
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Theorem 2.5.2. Subsequences of a convergent sequence converge to the same 
limit as the original sequence. 


Proof. Assume (a n ) — )• a, and let (a nk ) be a subsequence. Given e > 0, there 

< e whenever n > N. Because n & > k for all fc, 


exists N such that \a n — a 
the same N will suffice for the subsequence; that is, 
k > N. 


a 


rik 


— a 


< e whenever 

□ 


This not too surprising result has several somewhat surprising applications. 
It is the key ingredient for understanding when infinite sums are associative 
(Exercise 2.5.3). We can also use it in the following clever way to compute 
values of some familiar limits. 

Example 2.5.3. Let 0 < b < 1. Because 


b > b 2 > b 3 > b 4 > ■ ■ ■ > 0, 


the sequence ( b n ) is decreasing and bounded below. The Monotone Convergence 
Theorem allows us to conclude that ( b n ) converges to some l satisfying b > l > 0. 
To compute Z, notice that ( b 2n ) is a subsequence, so ( b 2n ) l by Theorem 2.5.2. 
But b 2n = b n • 6 n , so by the Algebraic Limit Theorem, ( b 2n ) l • Z = l 2 . Because 
limits are unique (Theorem 2.2.7), l 2 = Z, and thus 1 = 0. 

Without much trouble (Exercise 2.5.7), we can generalize this example to 
conclude ( b n ) — ^ 0 if and only if — 1 < b < 1. 

Example 2.5.4 (Divergence Criterion). Theorem 2.5.2 is also useful for 
providing economical proofs for divergence. In Example 2.2.8, we were quite 
sure that 


f 111111111111 

y 1, 2’ 3’ 4’ 5’ 5’ 5’ 5’ 5’ 5’ 5’ 5’ 5 



did not converge to any proposed limit. Notice that 


11111 


5 1 r 1 r 5 r 5 ^ 1 

5 5 5 5 


is a subsequence that converges to 1/5. Also, 


1 

5’ 


1 

5’ 


1 

5’ 


1 

5’ 


1 

5’ 



is a different subsequence of the original sequence that converges to —1/5. 
Because we have two subsequences converging to two different limits, we can 
rigorously conclude that the original sequence diverges. 
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The Bolzano— Weierstrass Theorem 

In the previous example, it was rather easy to spot a convergent subsequence 
(or two) hiding in the original sequence. For bounded sequences, it turns out 
that it is always possible to find at least one such convergent subsequence. 

Theorem 2.5.5 (Bolzano— Weierstrass Theorem). Every bounded sequence 
contains a convergent subsequence. 


Proof. Let (a n ) be a bounded sequence so that there exists M > 0 satisfying 


a 


< M for all n E N. Bisect the closed interval [— M, M] into the two closed 
intervals [— M, 0] and [0, M]. (The midpoint is included in both halves.) Now, it 
must be that at least one of these closed intervals contains an infinite number of 
the terms in the sequence (a n ). Select a half for which this is the case and label 
that interval as I\. Then, let a ni be some term in the sequence (a n ) satisfying 


a 


n i 


eh 




II 


CLri‘ 




-M 







's. 


1 2 


0 


M 


Next, we bisect I\ into closed intervals of equal length, and let I 2 be a half 
that again contains an infinite number of terms of the original sequence. Because 
there are an infinite number of terms from (a n ) to choose from, we can select 
an a n2 from the original sequence with n 2 > n\ and a n2 6 / 2 - In general, we 
construct the closed interval Ik by taking a half of Ik - 1 containing an infinite 
number of terms of (a n ) and then select nk > rik-i > • • • > n 2 > n\ so that 
e Ik- 

We want to argue that (a nfe ) is a convergent subsequence, but we need a 
candidate for the limit. The sets 




form a nested sequence of closed intervals, and by the Nested Interval Property 
there exists at least one point x E R contained in every Ik- This provides us 
with the candidate we were looking for. It just remains to show that (a nfe ) x. 

Let e > 0. By construction, the length of Ik is M(l/2) k 1 which converges 
to zero. (This follows from Example 2.5.3 and the Algebraic Limit Theorem.) 
Choose N so that k > N implies that the length of Ik is less than e. Because x 
and a n , are both in /&, it follows that a 


n k 


— x \ < e. 


□ 
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Exercises 

Exercise 2.5.1. Give an example of each of the following, or argue that such 
a request is impossible. 

(a) A sequence that has a subsequence that is bounded but contains no sub- 
sequence that converges. 

(b) A sequence that does not contain 0 or 1 as a term but contains subse- 
quences converging to each of these values. 

(c) A sequence that contains subsequences converging to every point in the 
infinite set {1, 1/2, 1/3, 1/4, 1/5, . . .}. 

(d) A sequence that contains subsequences converging to every point in the 
infinite set {1, 1/2, 1/3, 1/4, 1/5,...}, and no subsequences converging to 
points outside of this set. 

Exercise 2.5.2. Decide whether the following propositions are true or false, 
providing a short justification for each conclusion. 

(a) If every proper subsequence of (x n ) converges, then ( x n ) converges as well. 

(b) If (x n ) contains a divergent subsequence, then (x n ) diverges. 

(c) If (x n ) is bounded and diverges, then there exist two subsequences of (x n ) 
that converge to different limits. 

(d) if (x n ) is monotone and contains a convergent subsequence, then (x n ) 
converges. 

Exercise 2.5.3. (a) Prove that if an infinite series converges, then the asso- 

ciative property holds. Assume cq + (12 + <23 + <24 + <25 + • • • converges to 
a limit L (i.e., the sequence of partial sums (s n ) L). Show that any 
regrouping of the terms 

(ai + Ci 2 + ' ' ' + CL ni ) + (ci ni + 1 + • • • + &n 2 ) A ( a n 2 + l + ' ' ' + a n 3 ) + * * ' 
leads to a series that also converges to L. 

(b) Compare this result to the example discussed at the end of Section 2.1 
where infinite addition was shown not to be associative. Why doesn’t our 
proof in (a) apply to this example? 

Exercise 2.5.4. The Bolzano-Weierstrass Theorem is extremely important, 
and so is the strategy employed in the proof. To gain some more experience 
with this technique, assume the Nested Interval Property is true and use it 
to provide a proof of the Axiom of Completeness. To prevent the argument 
from being circular, assume also that (l/2 n ) 0. (Why precisely is this last 

assumption needed to avoid circularity?) 
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Exercise 2.5.5. Assume (a n ) is a bounded sequence with the property that 
every convergent subsequence of (a n ) converges to the same limit a E R. Show 
that (a n ) must converge to a. 


Exercise 2.5.6. Use a similar strategy to the one in Example 2.5.3 to show 
lim b 1//n exists for all b > 0 and find the value of the limit. (The results in 
Exercise 2.3.1 may be assumed.) 


Exercise 2.5.7. Extend the result proved in Example 2.5.3 to the case \b\ < 1; 
that is, show lim(6 n ) = 0 if and only if — 1 < b < 1. 


Exercise 2.5.8. Another way to prove the Bolzano- Weierstrass Theorem is to 
show that every sequence contains a monotone subsequence. A useful device in 
this endeavor is the notion of a peak term. Given a sequence (x n ), a particular 
term x m is a peak term if no later term in the sequence exceeds it; i.e., if 
Xm A x n for all n > m. 


(a) Find examples of sequences with zero, one, and two peak terms. Find 
an example of a sequence with infinitely many peak terms that is not 
monotone. 

(b) Show that every sequence contains a monotone subsequence and explain 
how this furnishes a new proof of the Bolzano-Weierstrass Theorem. 

Exercise 2.5.9. Let (a n ) be a bounded sequence, and define the set 


S = {x G R : x < a n for infinitely many terms a n }. 

Show that there exists a subsequence (a nk ) converging to s = sup S'. (This is a 
direct proof of the Bolzano-Weierstrass Theorem using the Axiom of 
Completeness.) 


2.6 The Cauchy Criterion 


The following definition bears a striking resemblance to the definition of con- 
vergence for a sequence. 

Definition 2.6.1. A sequence (a n ) is called a Cauchy sequence if, for every 
e > 0, there exists an N E N such that whenever m, n > N it follows that 


To make the comparison easier, let’s restate the definition of convergence. 


Definition 2.2.3. A sequence (a n ) converges to a real number a if, for every 
e > 0, there exists an N £ N such that whenever n > N it follows that 


a 


n 


a 


< e. 


As we have discussed, the definition of convergence asserts that, given an 
arbitrary positive e, it is possible to find a point in the sequence after which 
the terms of the sequence are all closer to the limit a than the given e. On the 
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other hand, a sequence is a Cauchy sequence if, for every e, there is a point 
in the sequence after which the terms are all closer to each other than the 
given e. To spoil the surprise, we will argue in this section that in fact these 
two definitions are equivalent: Convergent sequences are Cauchy sequences, 
and Cauchy sequences converge. The significance of the definition of a Cauchy 
sequence is that there is no mention of a limit. This is somewhat like the 
situation with the Monotone Convergence Theorem in that we will have another 
way of proving that sequences converge without having any explicit knowledge 
of what the limit might be. 

Theorem 2.6.2. Every convergent sequence is a Cauchy sequence. 

Proof. Assume (x n ) converges to x. To prove that (x n ) is Cauchy, we must 
find a point in the sequence after which we have \x n — x m \ < e. This can be 
done using an application of the triangle inequality. The details are requested 
in Exercise 2.6.1. □ 


The converse is a bit more difficult to prove, mainly because, in order to prove 
that a sequence converges, we must have a proposed limit for the sequence to 
approach. We have been in this situation before in the proofs of the Monotone 
Convergence Theorem and the Bolzano-Weierstrass Theorem. Our strategy 
here will be to use the Bolzano-Weierstrass Theorem. This is the reason for the 
next lemma. (Compare this with Theorem 2.3.2.) 

Lemma 2.6.3. Cauchy sequences are bounded. 


Proof. Given e = 1, there exists an N such that 


ry. ry. 

m 


n 


< 1 for all m, n > N. 


Thus, we must have x n < xn + 1 for all n > N. It follows that 


M = max{|xi 




Xn- 1 1? \x N \ + 1} 


is a bound for the sequence (x n ). □ 

Theorem 2.6.4 (Cauchy Criterion). A sequence converges if and only if it 
is a Cauchy sequence. 

Proof. (=>) This direction is Theorem 2.6.2. 

(<=) For this direction, we start with a Cauchy sequence (x n ). Lemma 2.6.3 
guarantees that (x n ) is bounded, so we may use the Bolzano-Weierstrass The- 
orem to produce a convergent subsequence (x nk ). Set 


x = limx nk . 

The idea is to show that the original sequence (x n ) converges to this same limit. 
Once again, we will use a triangle inequality argument. We know the terms 
in the subsequence are getting close to the limit x, and the assumption that 
(x n ) is Cauchy implies the terms in the “tail” of the sequence are close to each 
other. Thus, we want to make each of these distances less than half of the 
prescribed e. 
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Let e 0. Because ( x n ) is Cauchy, there exists N such that 


x n 





whenever m, n > N. Now, we also know that (x nk ) x, so choose a term in 
this subsequence, call it x nK , with uk > N and 


x 


riK 


X 



To see that N has the desired property (for the original sequence (x n )), observe 
that if n > TV, then 


rp rp 

tXj 

— 

rp rp 

^ n n k 

+ x n K ~ X 



< 

rp rp 

n k 

+ 

rp rp 

• Jy riK ^ 



e e 





< 

— T ~ = e 





□ 


The Cauchy Criterion is named after the French mathematician Augustin 
Louis Cauchy. Cauchy is a major figure in the history of many branches of 
mathematics — number theory and the theory of finite groups, to name a few — 
but he is most widely recognized for his enormous contributions in analysis, 
especially complex analysis. He is deservedly credited with inventing the e- 
based definition of limits we use today, although it is probably better to view 
him as a pioneer of analysis in the sense that his work did not attain the level 
of refinement that modern mathematicians have come to expect. The Cauchy 
Criterion, for instance, was devised and used by Cauchy to study infinite series, 
but he never actually proved it in both directions. The fact that there were 
gaps in Cauchy’s work should not diminish his brilliance in any way. The 
issues of the day were both difficult and subtle, and Cauchy was far and away 
the most influential in laying the groundwork for modern standards of rigor. 
Karl Weierstrass played a major role in sharpening Cauchy’s arguments. We 
will hear a good deal more from Weierstrass, most notably in Chapter 6 when 
we take up uniform convergence. Bernhard Bolzano was working in Prague 
and was writing and thinking about many of these same issues surrounding 
limits and continuity. Because his work was not widely available to the rest 
of the mathematical community, his historical reputation never achieved the 
distinction that his impressive accomplishments would seem to merit. 


Completeness Revisited 

In the first chapter, we established the Axiom of Completeness (AoC) to be the 
assertion that nonempty sets bounded above have least upper bounds. We then 
used this axiom as the crucial step in the proof of the Nested Interval Property 
(NIP). In this chapter, AoC was the central step in the Monotone Convergence 
Theorem (MCT), and NIP was the key to proving the Bolzano- Weierstrass 
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Theorem (BW). Finally, we needed BW in our proof of the Cauchy Criterion 
(CC) for convergent sequences. The list of implications then looks like 


AoC => 


NIP 

MCT. 


BW => CC. 


But this one-directional list is not the whole story. Recall that in our original 
discussions about completeness, the fundamental problem was that the rational 
numbers contained “gaps.” The reason for moving from the rational numbers 
to the real numbers to do analysis is so that when we encounter a sequence that 
looks as if it is converging to some number — say y/2 — then we can be assured 
that there is indeed a number there that we can call the limit. The assertion 
that “nonempty sets bounded above have least upper bounds” is simply one 
way to mathematically articulate our insistence that there be no “holes” in our 
ordered field, but it is not the only way. Instead, we could have taken MCT to 
be our defining axiom and used it to prove NIP and the existence of least upper 
bounds. This is the content of Exercise 2.4.4. 

How about NIP? Could this property serve as a starting point for a proper 
axiomatic treatment of the real numbers? Almost. In Exercise 2.5.4 we showed 
that NIP implies AoC, but to prevent the argument from making implicit use 
of AoC we needed an extra assumption that is equivalent to the Archimedean 
Property (Theorem 1.4.2). This extra hypothesis is unavoidable. Whereas AoC 
and MCT can both be used to prove that N is not a bounded subset of R, there 
is no way to prove this same fact starting from NIP. The upshot is that NIP 
is a perfectly reasonable candidate to use as the fundamental axiom of the real 
numbers provided that we also include the Archimedean Property as a second 
unproven assumption. 

In fact, if we assume the Archimedean Property holds, then AoC, NIP, MCT, 
BW, and CC are equivalent in the sense that once we take any one of them to 
be true, it is possible to derive the other four. However, because we have an 
example of an ordered field that is not complete — namely, the set of rational 
numbers — we know it is impossible to prove any of them using only the field 
and order properties. Just how we decide which should be the axiom and which 
then become theorems depends largely on preference and context, and in the 
end is not especially significant. What is important is that we understand all of 
these results as belonging to the same family, each asserting the completeness 
of R in its own particular language. 

One loose end in this conversation is the curious and somewhat unpredictable 
relationship of the Archimedean Property to these other results. As we have 
mentioned, the Archimedean Property follows as a consequence of AoC as well 
as MCT, but not from NIP. Starting from BW, it is possible to prove MCT and 
thus also the Archimedean Property. On the other hand, the Cauchy Criterion 
is like NIP in that it cannot be used on its own to prove the Archimedean 
Property. 1 

1 A thorough account of the logical dependence between these various results can be found 
in [23]. 
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Exercises 

Exercise 2.6.1. Supply a proof for Theorem 2.6.2. 

Exercise 2.6.2. Give an example of each of the following, or argue that such 
a request is impossible. 

(a) A Cauchy sequence that is not monotone. 

(b) A Cauchy sequence with an unbounded subsequence. 

(c) A divergent monotone sequence with a Cauchy subsequence. 

(d) An unbounded sequence containing a subsequence that is Cauchy. 

Exercise 2.6.3. If (x n ) and (y n ) are Cauchy sequences, then one easy way 
to prove that ( x n + y n ) is Cauchy is to use the Cauchy Criterion. By Theo- 
rem 2.6.4, (x n ) and (y n ) must be convergent, and the Algebraic Limit Theorem 
then implies (x n + y n ) is convergent and hence Cauchy. 

(a) Give a direct argument that (x n + y n ) is a Cauchy sequence that does not 
use the Cauchy Criterion or the Algebraic Limit Theorem. 


(b) Do the same for the product (x n y n ). 

Exercise 2.6.4. Let (a n ) and (b n ) be Cauchy sequences. Decide whether each 
of the following sequences is a Cauchy sequence, justifying each conclusion. 


(a) c n 


a n b n 


(b) c n = (— l) n a n 


( c ) c n = i 

X. 


a 


n 


, where [[#]] refers to the greatest integer less than or equal to 


Exercise 2.6.5. Consider the following (invented) definition: A sequence (s n ) 
is pseudo- Cauchy if, for all e > 0, there exists an N such that if n > TV, then 

Sn+l &n | ^ C. 

Decide which one of the following two propositions is actually true. Supply 
a proof for the valid statement and a counterexample for the other. 

(i) Pseudo-Cauchy sequences are bounded. 

(ii) If (x n ) and (y n ) are pseudo- Cauchy, then (x n + y n ) is pseudo-Cauchy as 
well. 


Exercise 2.6.6. Let’s call a sequence (a n ) quasi-increasing if for all e > 0 there 
exists an N such that whenever n > m > N it follows that a n > a m — e. 

(a) Give an example of a sequence that is quasi-increasing but not monotone 
or eventually monotone. 
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(b) Give an example of a quasi-increasing sequence that is divergent and not 
monotone. 

(c) Is there an analogue of the Monotone Convergence Theorem for quasi- 
increasing sequences? Give an example of a bounded, quasi-increasing 
sequence that doesn’t converge, or prove that no such sequence exists. 

Exercise 2.6.7. Exercises 2.4.4 and 2.5.4 establish the equivalence of the Axiom 
of Completeness and the Monotone Convergence Theorem. They also show the 
Nested Interval Property is equivalent to these other two in the presence of the 
Archimedean Property. 

(a) Assume the Bolzano- Weierstrass Theorem is true and use it to construct a 
proof of the Monotone Convergence Theorem without making any appeal 
to the Archimedean Property. This shows that BW, AoC, and MCT are 
all equivalent. 

(b) Use the Cauchy Criterion to prove the Bolzano- Weierstrass Theorem, and 
find the point in the argument where the Archimedean Property is implic- 
itly required. This establishes the final link in the equivalence of the five 
characterizations of completeness discussed at the end of Section 2.6. 

(c) How do we know it is impossible to prove the Axiom of Completeness 
starting from the Archimedean Property? 

2.7 Properties of Infinite Series 

Given an infinite series J2kLi a k, it is important to keep a clear distinction 
between 

(i) the sequence of terms : (ai, & 2 , & 3 , • • •) and 

(ii) the sequence of partial sums: (si, S 2 , 5 3, • • •), w h ere s n = • - + a n . 

The convergence of the series YlkL i a k is defined in terms of the sequence (s n ). 
Specifically, the statement 

oo 

a k — A means that lim s n = A. 

k = 1 

It is for this reason that we can immediately translate many of our results from 
the study of sequences into statements about the behavior of infinite series. 

Theorem 2.7.1 (Algebraic Limit Theorem for Series). IfYlkL i a k = A 
and Y2k*=i = B, then 

(i) YlkLi ca k = °A for all c £ R and 

(ii) i( a k + bk) = A + B. 
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Proof, (i) In order to show that YlkLi ca k = cA, we must argue that the 
sequence of partial sums 

tm = CCL\ + CO 2 H~ CU3 + • • • + CU m 

converges to cA But we are given that YlkLi a k converges to A, meaning that 
the partial sums 

s rn = cl\ + U 2 T + • • • + a m 

converge to A. Because t rn = cs m , applying the Algebraic Limit Theorem for 
sequences (Theorem 2.3.3) yields (t m ) —> cA , as desired. 

The proof of part (ii) is analogous and is left as an unofficial exercise. □ 

One way to summarize Theorem 2.7.1 (i) is to say that infinite addition still 
satisfies the distributive property. Part (ii) verifies that series can be added in 
the usual way. Missing from this theorem is any statement about the product of 
two infinite series. At the heart of this question is the issue of commutativity, 
which requires a more delicate analysis and so is postponed until Section 2.8. 

Theorem 2.7.2 (Cauchy Criterion for Series). The series J2T=i ak con ~ 
verges if and only if given e > 0, there exists an N E N such that whenever 
n > m > N it follows that 


&m + 1 + Um + 2 + • ' ' + a n 


< e. 


Proof. Observe that 


s n s 


m 


&m + 1 + a m- f2 + * ' ' + CL 


n 


and apply the Cauchy Criterion for sequences. 


□ 


The Cauchy Criterion leads to economical proofs of several basic facts about 
series. 


Theorem 2.7.3. If the series J2k=i ak converges, then (a^) — )• 0. 

Proof. Consider the special case n = m + 1 in the Cauchy Criterion for Series. 

□ 


Every statement of this result should be accompanied with a reminder to 
look at the harmonic series (Example 2.4.5) to erase any misconception that the 
converse statement is true. Knowing (a&) tends to 0 does not imply that the 
series converges. 

Theorem 2.7.4 (Comparison Test). Assume (a^) and (bk) ore sequences 
satisfying 0 < for all k G N. 

(i) if EZLi h converges, then Y^k= 1 a k converges. 

(ii) UJ2Z rl a/e diverges, then YlkLi bk diverges. 
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Proof. Both statements follow immediately from the Cauchy Criterion for Series 
and the observation that 


&ra+ 1 + CLm + 2 + ' ' ’ + &n 


< 


^m+l T ^ra+2 + ‘ ‘ + b n \. 


Alternate proofs using the Monotone Convergence Theorem are requested in 
the exercises. □ 


This is a good point to remind ourselves again that statements about con- 
vergence of sequences and series are immune to changes in some finite number 
of initial terms. In the Comparison Test, the requirement that 0 < < bk 

does not really need to hold for all k G N but just needs to be eventually true. 
A weaker, but sufficient, hypothesis would be to assume that there exists some 
point Mg N such that the inequality < bk is true for all k > M . 

The Comparison Test is used to deduce the convergence or divergence of one 
series based on the behavior of another. Thus, for this test to be of any great 
use, we need a catalog of series we can use as measuring sticks. In Section 2.4, 
we proved the Cauchy Condensation Test, which led to the general statement 
that the series Y^Li V nP converges if and only if p > 1. 

The next example summarizes the situation for another important class of 
series. 


Example 2.7.5 (Geometric Series). A series is called geometric if it is of 
the form 

oo 

ar k = a + ar + ar 2 + ar 3 + • • • . 

k = o 

If r = 1 and a / 0, the series evidently diverges. For r / 1, the algebraic 
identity 

(1 — r)(l + r + r 2 + r 3 + • • • + r m_1 ) = 1 — r m 
enables us to rewrite the partial sum 


Sm — cl + ar T ar 2 + ar 3 + • • • + ar 171 1 

Now the Algebraic Limit Theorem (for sequences) 
the conclusion 


oo 

ar k 

k=0 


a 


1 — r 


a( 1 — r m ) 

1 — r 

and Example 2.5.3 justify 


if and only if 


r 


< 1 . 


Although the Comparison Test requires that the terms of the series be posi- 
tive, it is often used in conjunction with the next theorem to handle series that 
contain some negative terms. 


Theorem 2.7.6 (Absolute Convergence Test). If the series 
verges , then Li a n converges as well. 



con- 
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Proof. This proof makes use of both the necessity (the “if” direction) and the 
sufficiency (the “only if” direction) of the Cauchy Criterion for Series. Because 
Y^=i \ a n\ converges, we know that, given an e > 0, there exists an TV £ N such 
that 


&ra+ 1 4" &rn+ 2 + ‘ ‘ + 


a 


n 


< e 


for all n > m > N. By the triangle inequality, 


&m+ 1 4" a m+ 2 + • ' • + CL n 


E &rn+ 1 4" Um+ 2 + • ' • + 


a 


n 


so the sufficiency of the Cauchy Criterion guarantees that a n also 

converges. □ 


The converse of this theorem is false. In the opening discussion of this 
chapter, we considered the alternating harmonic series 

11111 

1 — — — — — + — — — 4 ~'‘" • 

2 3 4 5 6 

Taking absolute values of the terms gives us the harmonic series 1 ! n -> 

which we have seen diverges. However, it is not too difficult to prove that with 
the alternating negative signs the series indeed converges. This is a special case 
of the Alternating Series Test. 

Theorem 2.7.7 (Alternating Series Test). Let (a n ) be a sequence satisfying , 

(i) a\ > <22 > <23 > • • • > a n > a n+ i > • • • and 

(ii) (a n ) — > 0 . 

Then, the alternating series ( — l) n+la n converges. 

Proof. A consequence of conditions (i) and (ii) is that a n > 0. Several proofs of 
this theorem are outlined in Exercise 2.7.1. □ 


Definition 2.7.8. If ^C n =i \ a n converges, then we say that the original series 
a n converges absolutely. If, on the other hand, the series a n con_ 

verges but the series of absolute values \ a n\ does not converge, then we 

say that the original series a n converges conditionally. 


In terms of this newly defined jargon, we have shown that 


00 


E 

n — 1 


(-1) 


n +1 


n 


converges conditionally, whereas 


00 


E 

n= 1 


(_l)n+l “ 1 

r>2 ’ 2-^ On 


n—1 


00 


E 

n—l 


(-1) 


n +1 


m 


•> 


and 
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converge absolutely. In particular, any convergent series with (all but finitely 
many) positive terms must converge absolutely. 

The Alternating Series Test is the most accessible test for conditional con- 
vergence, but several others are explored in the exercises. In particular, Abel’s 
Test, outlined in Exercise 2.7.13, will prove useful in our investigations of power 
series in Chapter 6. 


Rearrangements 

Informally speaking, a rearrangement of a series is obtained by permuting the 
terms in the sum into some other order. It is important that all of the original 
terms eventually appear in the new ordering and that no term gets repeated. 
In an earlier discussion from Section 2.1, we formed a rearrangement of the 
alternating harmonic series by taking two positive terms for each negative term: 

11111 

l_l_ — — — _|_ — _|_ — — — _|_... 

3 2 5 7 4 

There are clearly an infinite number of rearrangements of any sum; however, it 
is helpful to see why neither 

11111 
IT - — — — 4- — T — — — + • • • 

2 3 4 5 6 

nor 

1111111 1 
1 H - — — — T — T — — — 4~ — 4~ — — — + ••• 

3 4 5 7 8 9 11 12 

is considered a rearrangement of the original alternating harmonic series. 

Definition 2.7.9. Let YlkL =1 a/e be a series. A series YlkLi 6 k is called a rear- 
rangement of YlkL i a k if there exists a one-to-one, onto function / : N N 
such that 5/(/c) = a/e for all k £ N. 

We now have all the tools and notation in place to resolve an issue raised 
at the beginning of the chapter. In Section 2.1, we constructed a particular 
rearrangement of the alternating harmonic series that converges to a limit dif- 
ferent from that of the original series. This happens because the convergence is 
conditional. 

Theorem 2.7.10. If a series converges absolutely, then any rearrangement of 
this series converges to the same limit. 

Proof. Assume YlkL i a k converges absolutely to A, and let YlkL i 6 fc be a rear- 


rangement of YlkLi a k' Let’s use 


n 


s n — cik — a i 4- a 2 4- • • • 4- a n 


k = 1 


for the partial sums of the original series and use 


m 


trn — ^ 6/e - 6i 4- 6 2 + • ‘ + 6m 


k = 1 
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for the partial sums of the rearranged series. Thus we want to show that 
(tm) A. 

Let e 0. By hypothesis, (<s n ) — y A, so choose N\ such that 


’n 




for all n> N\. Because the convergence is absolute, we can choose N 2 so that 


E 

k=m-\-l 



< 


e 

2 


for all n > m > N 2 . Now, take N = max{7Vi, N 2 }. We know that the finite set 
of terms {ai, < 22 , as , . . . , cin} must all appear in the rearranged series, and we 
want to move far enough out in the series Y^Li b n so that we have included all 
of these terms. Thus, choose 


M = max{/(fc) : 1 < k < N}, 


It should now be evident that if m > M, then (t m — sn) consists of a finite 
set of terms, the absolute values of which appear in the tail \ a k\ ■ Our 

choice of N 2 earlier then guarantees | t m — sjy \ < e/2, and so 



< 

< 


t 


m 


$N + $N 


tm — $N + $N 



A 

-A 


whenever rn > M. □ 

Exercises 

Exercise 2.7.1. Proving the Alternating Series Test (Theorem 2.7.7) amounts 
to showing that the sequence of partial sums 

S n — CL\ CL 2 T U3 • • • zb Qj n 

converges. (The opening example in Section 2.1 includes a typical illustration 
of (s n ).) Different characterizations of completeness lead to different proofs. 

(a) Prove the Alternating Series Test by showing that (s n ) is a Cauchy 
sequence. 

(b) Supply another proof for this result using the Nested Interval Property 
(Theorem 1.4.1). 

(c) Consider the subsequences (s 2n ) and (s 2n +i), and show how the Monotone 
Convergence Theorem leads to a third proof for the Alternating Series 
Test. 
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Exercise 2.7.2. Decide whether each of the following series converges or 
diverges: 

f~\ l /U\ y^°° sin ( n ) 

V 01 / z^n= 1 2 n +n V u / 2-^n—l n 2 

( C ) 1 — — + - — — -U — — — _L 

vW x 4^6 8 ~ 10 12 ^ 

fd) 1 + - — 4 _l_ 4 4 — 4 4 4 — 4 _|_ . . . 

VV 1 ' 2 3 ^ 4 ^ 5 6 ^ 7 ^ 8 9 ^ 

1 1,1 1,1 1,1 1 , 

e J I-22+3-42+5-P-+7-82H 

Exercise 2.7.3. (a) Provide the details for the proof of the Comparison Test 

(Theorem 2.7.4) using the Cauchy Criterion for Series. 

(b) Give another proof for the Comparison Test, this time using the Monotone 
Convergence Theorem. 

Exercise 2.7.4. Give an example of each or explain why the request is impos- 
sible referencing the proper theorem(s). 

(a) Two series ^2 x n and J2 Vn that both diverge but where ^2 x n y n converges. 

(b) A convergent series ^2 x n and a bounded sequence (y n ) such that ^2 x nUn 
diverges. 

(c) Two sequences (x n ) and (y n ) where J2 x n and ^2(x n +y n ) both converge 
but Y2 Vn diverges. 

(d) A sequence (x n ) satisfying 0 < x n < 1 jn where l) n x n diverges. 

Exercise 2.7.5. Now that we have proved the basic facts about geometric 
series, supply a proof for Corollary 2.4.7. 

Exercise 2.7.6. Let’s say that a series subverges if the sequence of partial 
sums contains a subsequence that converges. Consider this (invented) definition 
for a moment, and then decide which of the following statements are valid 
propositions about subvergent series: 

(a) If (a n ) is bounded, then J2 a n subverges. 

(b) All convergent series are subvergent. 

(c) ifEKI subverges, then ^2 a n subverges as well. 

(d) If J2 a n subverges, then (a n ) has a convergent subsequence. 

Exercise 2.7.7. (a) Show that if a n > 0 and lim(na n ) = l with l ^ 0, then 

the series ^2 a n diverges. 

(b) Assume a n > 0 and lim(n 2 a n ) exists. Show that J2 a n converges. 

Exercise 2.7.8. Consider each of the following propositions. Provide short 
proofs for those that are true and counterexamples for any that are not. 
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(a) If E a„ converges absolutely, then E also converges absolutely. 

(b) If E an converges and ( b n ) converges, then E a rJ>n converges. 

(C) If E On converges conditionally, then J]n 2 a n diverges. 

Exercise 2.7.9 (Ratio Test). Given a series a n with a n ^ 0, the Ratio 

Test states that if (a n ) satisfies 


lim 


^n+l 


a 


n 


= r < 1, 


then the series converges absolutely. 

(a) Let r' satisfy r < r' < 1. Explain why there exists an N such that n > N 


implies |a n +i| < 


a 


n 


r . 


(b) Why does |a/v| XX r/ ) n converge? 

(c) Now, show that 22 \ a n\ converges, and conclude that 22 a n converges. 

Exercise 2.7.10 (Infinite Products). Review Exercise 2.4.10 about infinite 
products and then answer the following questions: 


(a) Does f'l’z'l • tI • • • converge? 


1 2 4 8 16 


(b) The infinite product \ • | • | • | 
it converge to zero? 


_ 9 _ 

10 


certainly converges. (Why?) Does 


(c) In 1655, John Wallis famously derived the formula 


/2 • 2\ /4 • 4\ /6 • 6\ /8 • 8\ _ tt 

VR3 y V3W y vsw y y ” _ 2 ‘ 

Show that the left side of this identity at least converges to something. 
(A complete proof of this result is taken up in Section 8.3.) 


Exercise 2.7.11. Find examples of two series 22 a n and 22^n both of which 
diverge but for which J^min {a ni b n } converges. To make it more challenging, 
produce examples where (a n ) and (b n ) are strictly positive and decreasing. 

Exercise 2.7.12 (Summation- by-parts). Let (x n ) and (y n ) be sequences, let 
s n = x\ + X 2 + • — h x n and set so = 0. Use the observation that Xj = Sj — Sj - 1 
to verify the formula 

n n 

^ ^ Uj SnUn+l Sm — lUm H~ ^ ^ (iJj Uj + 1)* 

j=m j=m 
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Exercise 2.7.13 (Abel’s Test). Abel’s Test for convergence states that if the 
series YlkLi x k converges, and if (y^) is a sequence satisfying 

Vi > V2 > V3 > • • • > 0 , 
then the series YlkLi x kUk converges. 

(a) Use Exercise 2.7.12 to show that 

n n 

^ ^ x kVk S nUn-Cl T ^ ^ ^k{l/k Vk-\- 1)5 
k = 1 k= 1 

where s n = x\ + + • • • + x n . 

(b) Use the Comparison Test to argue that Y^=i s k(llk ~ Vk+i) converges 
absolutely, and show how this leads directly to a proof of Abel’s Test. 

Exercise 2.7.14 (Dirichlet’s Test). Dirichlet’s Test for convergence states 
that if the partial sums of x k are bounded (but not necessarily conver- 

gent), and if (y^) is a sequence satisfying yi > 2/2 A 2/3 A • • • A 0 with lim^ = 0, 
then the series x kVk converges. 

(a) Point out how the hypothesis of Dirichlet’s Test differs from that of Abel’s 
Test in Exercise 2.7.13, but show that essentially the same strategy can 
be used to provide a proof. 

(b) Show how the Alternating Series Test (Theorem 2.7.7) can be derived as 
a special case of Dirichlet’s Test. 

2.8 Double Summations and Products 
of Infinite Series 

Given a doubly indexed array of real numbers {a^- : i,j E N}, we discovered 
in Section 2.1 that there is a dangerous ambiguity in how we might define 
YlTj=i a ij' Performing the sum over first one of the variables and then the 
other is referred to as an iterated summation. In our specific example, summing 
the rows first and then taking the sum of these totals produced a different result 
than first computing the sum of each column and adding these sums together. 
In short, 

00 00 00 00 

a ij 7 ^ a ij • 

j — l i—l i—\ j=l 

There are still other ways to reasonably define YlTj=i One natural idea 
is to calculate a kind of partial sum by adding together finite numbers of terms 
in larger and larger “rectangles” in the array; that is, for m, n E N, set 



i=i j = 1 
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The order of the sum here is irrelevant because the sum is finite. Of particular 
interest to our discussion are the sums s nn (sums over “squares”), which form 
a legitimate sequence indexed by n and thus can be subjected to our arsenal 
of theorems and definitions. If the sequence (s nn ) converges, for instance, we 
might wish to define 

oo 



i,j= 1 


lim s nn . 

n—± oo 


Exercise 2.8.1. Using the particular array (a^-) from Section 2.1, compute 
lim n ^ 00 s nn . How does this value compare to the two iterated values for the 
sum already computed? 


There is a deep similarity between the issue of how to define a double summa- 
tion and the topic of rearrangements discussed at the end of Section 2.7. Both 
relate to the commutativity of addition in an infinite setting. For rearrange- 
ments, the resolution came with the added hypothesis of absolute convergence, 
and it is not surprising that the same remedy applies for double summations. 
Under the assumption of absolute convergence, each of the methods discussed 
for computing the value of a double sum yields the same result. 


Exercise 2.8.2. Show that if the iterated series 


oo oo 

EE 

2=1 .7 = 1 


a ij 


converges (meaning that for each fixed i £ N the series Y2j = i \ a ij\ converges to 
some real number b { , and the series J]^i ^ converges as well), then the iterated 
series 


oo oo 



a ij 


i=l j = l 


converges. 


Theorem 2.8.1. Let { a ij : i, j £ N} be a doubly indexed array of real numbers. 

if 


oo oo 



\ a ij 


2=1 .7 = 1 

converges, then both Y^jLi a ij an d X^i a ij converge to the same 

value. Moreover, 


oo oo 


oo oo 


lim s nn — 


n— »- oo 



CLij 


2=1 7=1 7=1 2=1 



O'ij , 


7 s~^ti \-^n 

where s nn — / > 7—7 / g—i Oij. 

Proof. In the same way that we defined the rectangular partial sums s mn above 
in equation ( 1 ), define 


m n 


tmn 



\a 


ij 


i = 1 3 = 1 
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Exercise 2.8.3. (a) Prove that (t nn ) converges. 

(b) Now, use the fact that (t nn ) is a Cauchy sequence to argue that (s nn ) 
converges. 


We can now set 

S = lim s nn . 

n— Yoo 

In order to prove the theorem, we must show that the two iterated sums converge 
to this same limit. We will first show that 


S = 


oo oo 

EE 

1=1 j= 1 


a 


13 


Because {t mn : m,n G N} is bounded above, we can let 

B = sup{t mn : m, n G N}. 


Exercise 2.8.4. (a) Let e > 0 be arbitrary and argue that there exists an 

Ni G N such that m, n > Ni implies B — | < t mn < B. 

(b) Now, show that there exists an N such that 


’mn 


-51 < 


for all m, n > N. 


For the moment, consider m G N to be fixed and write s mn as 


n 


n 


n 


’mn 


— E a ij + E a 2j ^ E 


a 


mj 


i=i 


i=i 


i=i 


Our hypothesis guarantees that for each fixed row i, the series Xljli a q' con " 
verges absolutely to some real number ?y. 

Exercise 2.8.5. (a) Show that for all m > N 


(ri + r 2 H f- r m ) - 5 


< e. 


Conclude that the iterated sum Y^jLi a ij converges to S. 

(b) Finish the proof by showing that the other iterated sum, 

converges to S as well. Notice that the same argument can be used once 
it is established that, for each fixed column j, the sum a ij converges 

to some real number Cj. □ 
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One final common way of computing a double summation is to sum along 
diagonals where i + j equals a constant. Given a doubly indexed array {a^- : 
i,j e N}, let 

^2 = a ii, ds = a\2 + a2i) d,4 = ai3 + 022 + a 3i, 
and in general set 


dk — CLl,k-l + a 2,k-2 + • • • + CLk- 1,1- 


Then, YlkL 2 dk represents another reasonable way of summing over every in 
the array. 

Exercise 2.8.6. (a) Assuming the hypothesis — and hence the conclusion — of 

Theorem 2.8.1, show that ^2^=2 dk converges absolutely. 

(b) Imitate the strategy in the proof of Theorem 2.8.1 to show that YlkL 2 dk 
converges to S = lim n ^ 00 s nn . 

Products of Series 

Conspicuously missing from the Algebraic Limit Theorem for Series (Theo- 
rem 2.7.1) is any statement about the product of two convergent series. One 
way to formally carry out the algebra on such a product is to write 


( 


00 



2=1 



(ai + &2 T <2 3 + • • • ) ( b\ + 62 + 63 + * * • ) 


ai&i T (11162 T <2261) + (<2361 + (2262 + <2163) + • • • 

00 

k—2 


where 


dk — a lbk-l + a 2^k-2 + * * * + H/c-l^l- 


This particular form of the product, examined earlier in Exercise 2.8.6, is called 
the Cauchy product of two series. Although there is something algebraically 
natural about writing the product in this form, it may very well be that com- 
puting the value of the sum is more easily done via one or the other iterated 
summation. The question remains, then, as to how the value of the Cauchy 
product — if it exists — is related to these other values of the double sum. If the 
two series being multiplied converge absolutely, it is not too difficult to prove 
that the sum may be computed in whatever way is most convenient. 


Exercise 2.8.7. Assume that a i converges absolutely to A , and bj 

converges absolutely to B. 
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(a) Show that the iterated sum 1 


apply Theorem 2.8.1. 


n=l ^j = 1 


a,ibj\ converges so that we may 


(b) Let s nn = JT =1 J2j = i a ibj, and prove that lin^^oo s nn = AB. Conclude 
that 


oo oo 


oo oo 


oo 



bj — 



&ibj — ^ ^ dk — AB, 

2 = 1 j — 1 jf = l i = l k — 2 

where, as before, d & = ai^-i + <22^-2 + • • • + flfc-i&i. 


2.9 Epilogue 


Theorems 2.7.10 and 2.8.1 make it clear that absolute convergence is an 
extremely desirable quality to have when manipulating series. On the other 
hand, the situation for conditionally convergent series is delightfully patholog- 
ical. In the case of rearrangements, not only are they no longer guaranteed to 
converge to the same limit, but in fact if a n converges conditionally, then 

for any r E R there exists a rearrangement of X]^Li a n that converges to r. To 
see why, let’s look again at the alternating harmonic series 



22=1 


The negative terms taken alone form the series X^^Li ( — l)/2n. The partial 
sums of this series are precisely —1/2 the partial sums of the harmonic series, 
and so march off (at half speed) to negative infinity. A similar argument shows 
that the sum of positive terms J]^Li l/(2n — 1) also diverges to infinity. It is 
not too difficult to argue that this situation is always the case for conditionally 
convergent series. Now, let r be some proposed limit, which, for the sake of 
this argument, we take to be positive. The idea is to take as many positive 
terms as necessary to form the first partial sum greater than r. We then add 
negative terms until the partial sum falls below r, at which point we switch back 
to positive terms. The fact that there is no bound on the sums of either the 
positive terms or the negative terms allows this process to continue indefinitely. 
The fact that the terms themselves tend to zero is enough to guarantee that the 
partial sums, when constructed in this manner, indeed converge to r as they 
oscillate around this target value. 

Perhaps the best way to summarize the situation is to say that the hypothe- 
sis of absolute convergence essentially allows us to treat infinite sums as though 
they were finite sums. This assessment extends to double sums as well, although 
there are a few subtleties to address. In the case of products, we showed in Ex- 
ercise 2.8.7 that the Cauchy product of two absolutely convergent infinite series 
converges to the product of the two factors, but in fact the same conclusion 
follows if we only have absolute convergence in one of the two original series. In 
the notation of Exercise 2.8.7, if J2 a n converges absolutely to A , and if ^ b n 
converges (perhaps conditionally) to 5, then the Cauchy product ^ dk = AB. 
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On the other hand, if both Y a n and Y b n converge conditionally, then it is 
possible for the Cauchy product to diverge. Squaring Y(~ l) n /V^ provides an 
example of this phenomenon. Of course, it is also possible to find Y a n = A con- 
ditionally and Y = B conditionally whose Cauchy product Y dk converges. 
If this is the case, then the convergence is to the right value, namely Y^k = AB. 
A proof of this last fact will be offered in Chapter 6 (Exercise 6.5.9), where we 
undertake the study of power series. Here is the connection. A power series 
has the form clq + a\X + CL 2 X 2 + • • • . If we multiply two power series together as 
though they were polynomials, then when we collect common powers of x the 
result is 


(no A O'lX 4~ ci 2 X ^ + • • • )(&o A b\x -|- b 2 X ^ 4~ • • • ) 

= Uo^o 4~ (a 0 &i 4~ ci\bo)x + (< 20^2 4~ &\b\ 4~ 4~ • • • 

c\ 

= do -j- d\x A d2X ~\~ ■ * * , 

which is the Cauchy product of Y a nX n and YYx n . (The index starts with 
n = 0 rather than n — 1.) Upcoming results about the good behavior of power 
series will lead to a proof that convergent Cauchy products sum to the proper 
value. In the other direction, Exercise 2.8.7 will be useful in establishing a 
theorem about the product of two power series. 


Chapter 3 

Basic Topology of R 


3.1 Discussion: The Cantor Set 


What follows is a fascinating mathematical construction, due to Georg Cantor, 
which is extremely useful for extending the horizons of our intuition about the 
nature of subsets of the real line. Cantor’s name has already appeared in the 
first chapter in our discussion of uncountable sets. Indeed, Cantor’s proof that 
R is uncountable occupies another spot on the short list of the most significant 
contributions toward understanding the mathematical infinite. In the words of 
the mathematician David Hilbert, “No one shall expel us from the paradise that 
Cantor has created for us.” 

Let Co be the closed interval [0, 1], and define C\ to be the set that results 
when the open middle third is removed; that is, 



1 

3 



Now, construct C 2 in a similar way by removing the open middle third of each 
of the two components of Ci: 





2 7 
3’ 9 


U 



If we continue this process inductively, then for each n = 0, 1, 2, . . . we get a set 
C n consisting of 2 n closed intervals each having length l/3 n . Finally, we define 
the Cantor set C (Fig. 3.1) to be the intersection 

00 

C = n Cn. 

n — 0 
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C 0 

G 

C 2 

C 3 


0 


0 



0 




2/3 


2/3 7/9 


8/9 


Figure 3.1: Defining the Cantor set; C = D^Lo^* 


It may be useful to understand C as the remainder of the interval [0, 1] after 
the iterative process of removing open middle thirds is taken to infinity: 


C=[0,1]\ 


1 2\ (l 2\ (7 8 

3’ 3 ) U ( 9’ 9/ U ( 9’ 9 


U 


There is some initial doubt whether anything remains at all, but notice that 
because we are always removing open middle thirds, then for every n G N, 
0 G C n and hence 0 G C. The same argument shows 1 E C. In fact, if y is the 
endpoint of some closed interval of some particular set C n , then it is also an 
endpoint of one of the intervals of C n +\. Because, at each stage, endpoints are 
never removed, it follows that y E C n for all n. Thus, C at least contains the 
endpoints of all of the intervals that make up each of the sets C n . 

Is there anything else? Is C countable? Does C contain any intervals? Any 
irrational numbers? These are difficult questions at the moment. All of the 
endpoints mentioned earlier are rational numbers (they have the form m/ 3 n ), 
which means that if it is true that C consists of only these endpoints, then C 
would be a subset of Q and hence countable. We shall see about this. There is 
some strong evidence that not much is left in C if we consider the total length of 
the intervals removed. To form Ci, an open interval of length 1/3 was taken out. 
In the second step, we removed two intervals of length 1/9, and to construct 
C n we removed 2 n_1 middle thirds of length l/3 n . There is some logic, then, 
to defining the “length” of C to be 1 minus the total 



The Cantor set has zero length. 

To this point, the information we have collected suggests a mental picture 
of C as a relatively small, thin set. For these reasons, the set C is often referred 
to as Cantor “dust.” But there are some strong counterarguments that imply 
a very different picture. First, C is actually uncountable , with cardinality equal 
to the cardinality of R. One slightly intuitive but convincing way to see this is 
to create a 1-1 correspondence between C and sequences of the form (a n )^° =1 , 
where a n = 0 or 1. For each c G C, set a± = 0 if c falls in the left-hand component 
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Figure 3.2: MAGNIFYING SETS BY A FACTOR OF 3. 


of C i and set a\ = 1 if c falls in the right-hand component. Having established 
where in C\ the point c is located, there are now two possible components of 
C 2 that might contain c. This time, we set <22 = 0 or 1 depending on whether c 
falls in the left or right half of these two components of C 2 . Continuing in this 
way, we come to see that every element c E C yields a sequence (ai, 02,^3, • • •) 
of zeros and ones that acts as a set of directions for how to locate c within C. 
Likewise, every such sequence corresponds to a point in the Cantor set. Because 
the set of sequences of zeros and ones is uncountable (Exercise 1.6.4), we must 
conclude that C is uncountable as well. 

What does this imply? In the first place, because the endpoints of the 
approximating sets C n form a countable set, we are forced to accept the fact 
that not only are there other points in C but there are uncountably many of 
them. From the point of view of cardinality , C is quite large — as large as R, 
in fact. This should be contrasted with the fact that from the point of view of 
length , C measures the same size as a single point. We conclude this discussion 
with a demonstration that from the point of view of dimension , C strangely 
falls somewhere in between. 

There is a sensible agreement that a point has dimension zero, a line segment 
has dimension one, a square has dimension two, and a cube has dimension three. 
Without attempting a formal definition of dimension (of which there are several), 
we can nevertheless get a sense of how one might be defined by observing how 
the dimension affects the result of magnifying each particular set by a factor 
of 3 (Fig. 3.2). (The reason for the choice of 3 will become clear when we turn 
our attention back to the Cantor set). A single point undergoes no change 
at all, whereas a line segment triples in length. For the square, magnifying 
each length by a factor of 3 results in a larger square that contains 9 copies 
of the original square. Finally, the magnified cube yields a cube that contains 
27 copies of the original cube within its volume. Notice that, in each case, to 
compute the “size” of the new set, the dimension appears as the exponent of 
the magnification factor. 
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dim 

x3 

new copies 

point 

0 

- 

1 = 3° 

segment 

1 

- 

T— 1 

CO 

II 

CO 
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2 

- 

CO 

CO 

to 

cube 

3 

- 

27 = 3 3 

C 

X 

-► 

to 

II 

CO 

H 


Figure 3.3: Dimension of C; 2 = 3 X => x = log 2/ log 3. 


Now, apply this transformation to the Cantor set. The set Co = [0, 1] 
becomes the interval [0, 3]. Deleting the middle third leaves [0, 1] U [2, 3], which 
is where we started in the original construction except that we now stand to 
produce an additional copy of C in the interval [2, 3]. Magnifying the Cantor set 
by a factor of 3 yields two copies of the original set. Thus, if x is the dimension 
of C, then x should satisfy 2 = 3®, or x = log 2/ log 3 ~ .631 (Fig. 3.3). 

The notion of a noninteger or fractional dimension is the impetus behind 
the term “fractal,” coined in 1975 by Benoit Mandlebrot to describe a class 
of sets whose intricate structures have much in common with the Cantor set. 
Cantor’s construction, however, is over a hundred years old and for us represents 
an invaluable testing ground for the upcoming theorems and conjectures about 
the often elusive nature of subsets of the real line. 


3.2 Open and Closed Sets 


Given a G R, and e > o, recall that the e-neighborhood of a is the set 


V e (a) = {x G R : 


x — a 


< e}. 


In other words, V e (a) is the open interval (a — e, a + e), centered at a with 
radius e. 


Definition 3.2.1. A set O C R is open if for all points a G O there exists an 
e-neighborhood V e (a) C O. 

Example 3.2.2. (i) Perhaps the simplest example of an open set is R itself. 

Given an arbitrary element a G R, we are free to pick any e-neighborhood 
we like and it will always be true that V e (a) C R. It is also the case that 
the logical structure of Definition 3.2.1 requires us to classify the empty 
set 0 as an open subset of the real line. 

(ii) For a more useful collection of examples, consider the open interval 


(c, d) = {x G R : c < x < d}. 
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To see that (c, d) is open in the sense just defined, let x E (c, d) be arbi- 
trary. If we take e = min{x — c, d — #}, then it follows that T^(x) C (c, d). 
It is important to see where this argument breaks down if the interval 
includes either one of its endpoints. 

The union of open intervals is another example of an open set. This obser- 
vation leads to the next result. 

Theorem 3.2.3. (i) The union of an arbitrary collection of open sets is open. 

(ii) The intersection of a finite collection of open sets is open. 

Proof. To prove (i), we let {Oa : A E A} be a collection of open sets and let 
O = Uaga ® x ’ Let a be an arbitrary element of O. In order to show that O is 
open, Definition 3.2.1 insists that we produce an e-neighborhood of a completely 
contained in O. But a E O implies that a is an element of at least one particular 
O . Because we are assuming 0\> is open, we can use Definition 3.2.1 to assert 
that there exists V e (a) C Oy. The fact that Oy C O allows us to conclude that 
V e (a) C O. This completes the proof of (i). 

For (ii), let {Oi, O 2 , • • • , On} be a finite collection of open sets. Now, if 
a E HfcLi Ofc, then a is an element of each of the open sets. By the definition of 
an open set, we know that, for each 1 < k < TV, there exists V ek (a) C 0&. We 
are in search of a single e-neighborhood of a that is contained in every 0&, so 
the trick is to take the smallest one. Letting e = minjei, 62 , . . . , e^}, it follows 

that V e {a) C V €k (a ) for all fc, and hence V e (a) C fj^Li Ofc, as desired. □ 


Closed Sets 


Definition 3.2.4. A point x is a limit point of a set A if every e-neighborhood 
V e (x) of x intersects the set A at some point other than x. 

Limit points are also often referred to as “cluster points” or “accumulation 
points,” but the phrase LL x is a limit point of A” has the advantage of explicitly 
reminding us that x is quite literally the limit of a sequence in A. 


Theorem 3.2.5. A point x is a limit point of a set A if and only if x — lima n 
for some sequence (a n ) contained in A satisfying a n 7 ^ x for all n E N. 


Proof. (=>) Assume x is a limit point of A. In order to produce a sequence 
(a n ) converging to x, we are going to consider the particular e-neighborhoods 
obtained using e = 1/n. By Definition 3.2.4, every neighborhood of x intersects 
A in some point other than x. This means that, for each n E N, we are justified 
in picking a point 

a n E Vi/ n (x) fl A 


with the stipulation that a n 7 ^ x. It should not be too difficult to see why 
(a n ) —>■ x. Given an arbitrary e > 0 , choose N such that 1 fN e. It follows 


that 


a 


n 


X\ 


< e for all n > N. 
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(<^=) For the reverse implication we assume lim a n = x where a n E A but a n ^ 
x , and let V e (x) be an arbitrary e-neighborhood. The definition of convergence 
assures us that there exists a term ajy in the sequence satisfying ajy 6 V e (x), 
and the proof is complete. □ 

The restriction that a n ^ x in Theorem 3.2.5 deserves a comment. Given 
a point a E A, it is always the case that a is the limit of a sequence in A if 
we are allowed to consider the constant sequence (a, a, a, ... ). There will be 
occasions where we will want to avoid this somewhat uninteresting situation, so 
it is important to have a vocabulary that can distinguish limit points of a set 
from isolated points. 

Definition 3.2.6. A point a E A is an isolated point of A if it is not a limit 
point of A. 

As a word of caution, we need to be a little careful about how we understand 
the relationship between these concepts. Whereas an isolated point is always 
an element of the relevant set A , it is quite possible for a limit point of A not 
to belong to A. As an example, consider the endpoint of an open interval. This 
situation is the subject of the next important definition. 

Definition 3.2.7. A set F C R is closed if it contains its limit points. 

The adjective “closed” appears in several other mathematical contexts and 
is usually employed to mean that an operation on the elements of a given set 
does not take us out of the set. In linear algebra, for example, a vector space 
is a set that is “closed” under addition and scalar multiplication. In analysis, 
the operation we are concerned with is the limiting operation. Topologically 
speaking, a closed set is one where convergent sequences within the set have 
limits that are also in the set. 

Theorem 3.2.8. A set F C R is closed if and only if every Cauchy sequence 
contained in F has a limit that is also an element of F. 

Proof. Exercise 3.2.5. □ 

Example 3.2.9. (i) Consider 



1 

— : n E N 
n 


Let’s show that each point of A is isolated. Given l/n E A, choose 
e = l/n — l/(n + l). Then, 


V e (l/n) n A 



It follows from Definition 3.2.4 that l/n is not a limit point and so is 
isolated. Although all of the points of A are isolated, the set does have 
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one limit point, namely 0. This is because every neighborhood centered 
at zero, no matter how small, is going to contain points of A. Because 
0 ^ A, A is not closed. The set F = A U {0} is an example of a closed 
set and is called the closure of A. (The closure of a set is discussed in a 
moment.) 

(ii) Let’s prove that a closed interval 

c, d\ = {x G R : c < x < d} 


is a closed set using Definition 3.2.7. If x is a limit point of [c, d], then by 
Theorem 3.2.5 there exists (x n ) C [c, d\ with (x n ) x. We need to prove 
that x G [c, d\ . 

The key to this argument is contained in the Order Limit Theorem 
(Theorem 2.3.4), which summarizes the relationship between inequalities 
and the limiting process. Because c < x n < d, it follows from Theorem 
2.3.4 (iii) that c < x < d as well. Thus, [c, d] is closed. 


(iii) Consider the set Q C R of rational numbers. An extremely important 
property of Q is that its set of limit points is actually all of R. To see 
why this is so, recall Theorem 1.4.3 from Chapter 1, which is referred to 
as the density property of Q in R. 

Let y £ R be arbitrary, and consider any neighborhood V € (y) = (y — e, 
y + e). Theorem 1.4.3 allows us to conclude that there exists a rational 
number r ^ y that falls in this neighborhood. Thus, y is a limit point 
of Q. 


The density property of Q can now be reformulated in the following way. 


Theorem 3.2.10 (Density of Q in R). For every y G R, there exists a 
sequence of rational numbers that converges to y. 


Proof. Combine the preceding discussion with Theorem 3.2.5. 


□ 


The same argument can also be used to show that every real number is the 
limit of a sequence of irrational numbers. Although interesting, part of the 
allure of the rational numbers is that, in addition to being dense in R, they are 
countable. As we will see, this tangible aspect of Q makes it an extremely useful 
set, both for proving theorems and for producing interesting counterexamples. 


Closure 

Definition 3.2.11. Given a set iCR, let L be the set of all limit points of 
A. The closure of A is defined to be A = A U L. 

In Example 3.2.9 (i), we saw that if 4 = {1/n : n G N}, then the closure 
of A is A = A U {0}. Example 3.2.9 (iii) verifies that Q = R. If A is an open 
interval (a, 5), then A = [a, b\. If A is a closed interval, then A = A. It is not 
for lack of imagination that in each of these examples A is always a closed set. 
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Theorem 3.2.12. For any A C R, the closure A is a closed set and is the 
smallest closed set containing A. 

Proof. If L is the set of limit points of A, then it is immediately clear that A 
contains the limit points of A. There is still something more to prove, however, 
because taking the union of L with A could potentially produce some new limit 
points of A. In Exercise 3.2.7, we outline the argument that this does not 
happen. 

Now, any closed set containing A must contain L as well. This shows that 
A = A U L is the smallest closed set containing A. □ 

Complements 

The mathematical notions of open and closed are not antonyms the way they are 
in standard English. If a set is not open, that does not imply it must be closed. 
Many sets such as the half-open interval (c, d] = {x £ R : c < x < d} are neither 
open nor closed. The sets R and 0 are both simultaneously open and closed 
although, thankfully, these are the only ones with this disorienting property 
(Exercise 3.2.13). There is, however, an important relationship between open 
and closed sets. Recall that the complement of a set A C R is defined to be 
the set 

A = £ R i x fz. -d)"* 

Theorem 3.2.13. A set O is open if and only if O c is closed. Likewise, a set 
F is closed if and only if F c is open. 

Proof. Given an open set OCR, let’s first prove that O c is a closed set. To 
prove O c is closed, we need to show that it contains all of its limit points. If 
x is a limit point of O c , then every neighborhood of x contains some point of 
O c . But that is enough to conclude that x cannot be in the open set O because 
x £ O would imply that there exists a neighborhood V e (x) C O. Thus, x £ O c , 
as desired. 

For the converse statement, we assume O c is closed and argue that O is open. 
Thus, given an arbitrary point x £ O, we must produce an e-neighborhood 
V € (x) C O. Because O c is closed, we can be sure that x is not a limit point of 
O c . Looking at the definition of limit point, we see that this implies that there 
must be some neighborhood V e (x) of x that does not intersect the set O c . But 
this means V e (x) C O, which is precisely what we needed to show. 

The second statement in Theorem 3.2.13 follows quickly from the first using 
the observation that ( E c ) c = E for any set E C R. □ 

The last theorem of this section should be compared to Theorem 3.2.3. 
Theorem 3.2.14. (i) The union of a finite collection of closed sets is closed. 

(ii) The intersection of an arbitrary collection of closed sets is closed. 
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Proof. De Morgan’s Laws state that for any collection of sets {E\ : A G A} it is 
true that 



The result follows directly from these statements and Theorem 3.2.3. The 
details are requested in Exercise 3.2.9. □ 

Exercises 

Exercise 3.2.1. (a) Where in the proof of Theorem 3.2.3 part (ii) does the 

assumption that the collection of open sets be finite get used? 

(b) Give an example of a countable collection of open sets {Oi, O 2 , O 3 , . . .} 
whose intersection f]^Li O n is closed, not empty and not all of R. 

Exercise 3.2.2. Let 

A=|( — l) n H — :n = 1,2,3, ...j- and B = {x Q : 0 < x < 1} . 

Answer the following questions for each set: 

(a) What are the limit points? 

(b) Is the set open? Closed? 

(c) Does the set contain any isolated points? 

(d) Find the closure of the set. 

Exercise 3.2.3. Decide whether the following sets are open, closed, or neither. 
If a set is not open, find a point in the set for which there is no e-neighborhood 
contained in the set. If a set is not closed, find a limit point that is not contained 
in the set. 


(a) 

Q 





(b) 

N. 





(c) 

{x G 

R : 

x ^ 0}. 



(d) 

{1 + 

1/4 

+ 1/9 + • ■ 

• • + 1/n 2 

: n G N}. 

(e) 

{1 + 

1/2 

+ 1/3 + -- 

■ • + 1 jn: 

n G N}. 
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Exercise 3.2.4. Let A be nonempty and bounded above so that s = sup A 
exists. 

(a) Show that s E A. 

(b) Can an open set contain its supremum? 

Exercise 3.2.5. Prove Theorem 3.2.8. 

Exercise 3.2.6. Decide whether the following statements are true or false. 
Provide counterexamples for those that are false, and supply proofs for those 
that are true. 

(a) An open set that contains every rational number must necessarily be all 
of R. 

(b) The Nested Interval Property remains true if the term “closed interval” is 
replaced by “closed set.” 

(c) Every nonempty open set contains a rational number. 

(d) Every bounded infinite closed set contains a rational number. 

(e) The Cantor set is closed. 

Exercise 3.2.7. Given 4CR, let L be the set of all limit points of A. 

(a) Show that the set L is closed. 

(b) Argue that if x is a limit point of A U L, then x is a limit point of A. Use 
this observation to furnish a proof for Theorem 3.2.12. 

Exercise 3.2.8. Assume A is an open set and B is a closed set. Determine if 
the following sets are definitely open, definitely closed, both, or neither. 

(a) AuB 

(b) A\B = {x G A : x ^ B} 

(c) (A C UB) C 

(d) (AnB)u(A c nB) 

(e) TnT 

Exercise 3.2.9 (De Morgan’s Laws). A proof for De Morgan’s Laws in the 
case of two sets is outlined in Exercise 1.2.5. The general argument is similar. 
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(a) Given a collection of sets {E\ : A E A}, show that 



(b) Now, provide the details for the proof of Theorem 3.2.14. 

Exercise 3.2.10. Only one of the following three descriptions can be realized. 
Provide an example that illustrates the viable description, and explain why the 
other two cannot exist. 

(i) A countable set contained in [0, 1] with no limit points. 

(ii) A countable set contained in [0, 1] with no isolated points. 

(iii) A set with an uncountable number of isolated points. 

Exercise 3.2.11. (a) Prove that A U B = A U B. 

(b) Does this result about closures extend to infinite unions of sets? 

Exercise 3.2.12. Let A be an uncountable set and let B be the set of real 
numbers that divides A into two uncountable sets; that is, s E B if both {x : 
x G A and x < s} and { oo • oo G A and x > s} are uncountable. Show B is 
nonempty and open. 

Exercise 3.2.13. Prove that the only sets that are both open and closed are 
R and the empty set 0. 

Exercise 3.2.14. A dual notion to the closure of a set is the interior of a set. 
The interior of E is denoted E° and is defined as 


E° = {x G E : there exists V e {pc) C E}. 


Results about closures and interiors possess a useful symmetry. 

(a) Show that E is closed if and only if E = E. Show that E is open if and 
only if E° = E. 

(b) Show that E° = (4? c )°, and similarly that (E°) c = E c . 

Exercise 3.2.15. A set A is called an F a set if it can be written as the countable 
union of closed sets. A set B is called a Gs set if it can be written as the 
countable intersection of open sets. 


(a) Show that a closed interval [a, b] is a G$ set. 


(b) Show that the half-open interval (a, b] is both a Gs and an F a set. 


(c) Show that Q is an F a set, and the set of irrationals I forms a G$ set. 
(We will see in Section 3.5 that Q is not a Gs set, nor is I an F a set.) 
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3.3 Compact Sets 


The central challenge in analysis is to exploit the power of the mathematical 
infinite — via limits, series, derivatives, integrals, etc. — without falling victim to 
erroneous logic or faulty intuition. A major tool for maintaining a rigorous 
footing in this endeavor is the concept of compact sets. In ways that will be- 
come clear, especially in our upcoming study of continuous functions, employing 
compact sets in a proof often has the effect of bringing a finite quality to the 
argument, thereby making it much more tractable. 

Definition 3.3.1 (Compactness). A set if C R is compact if every sequence 
in K has a subsequence that converges to a limit that is also in K. 


Example 3.3.2. The most basic example of a compact set is a closed interval. 
To see this, notice that if (a n ) is contained in an interval [c, d], then the Bolzano- 
Weierstrass Theorem guarantees that we can find a convergent subsequence 
(a nfc ). Because a closed interval is a closed set (Example 3.2.9, (h)), we know 
that the limit of this subsequence is also in [c, d]. 


What are the properties of closed intervals that we used in the preceding 
argument? The Bolzano- Weierstrass Theorem requires boundedness, and we 
used the fact that closed sets contain their limit points. As we are about to 
see, these two properties completely characterize compact sets in R. The term 
“bounded” has thus far only been used to describe sequences (Definition 2.3.1), 
but an analogous statement can easily be made about sets. 


Definition 3.3.3. A set A C R is bounded if there exists M > 0 such that 


a 


< M for all a E A. 


Theorem 3.3.4 (Characterization of Compactness in R). A set if C R 

is compact if and only if it is closed and bounded. 


Proof Let K be compact. We will first prove that K must be bounded, so 
assume, for contradiction, that K is not a bounded set. The idea is to produce 
a sequence in K that marches off to infinity in such a way that it cannot have a 
convergent subsequence as the definition of compact requires. To do this, notice 
that because K is not bounded there must exist an element x\ E K satisfying 
xi\ > 1. Likewise, there must exist G K with \x 2 \ > 2, and in general, given 
any n E N, we can produce x n E K such that \x n \ > n. 

Now, because K is assumed to be compact, (x n ) should have a convergent 
subsequence (x nk ). But the elements of the subsequence must satisfy \x nk \ > 
rife, and consequently (x nk ) is unbounded. Because convergent sequences are 
bounded (Theorem 2.3.2), we have a contradiction. Thus, K must at least be a 
bounded set. 

Next, we will show that K is also closed. To see that K contains its limit 
points, we let x = limx n , where (x n ) is contained in K and argue that x 
must be in K as well. By Definition 3.3.1, the sequence (x n ) has a convergent 
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subsequence (x nk ), and by Theorem 2.5.2, we know (x Uk ) converges to the same 
limit x. Finally, Definition 3.3.1 requires that x E K. This proves that K is 
closed. 

The proof of the converse statement is requested in Exercise 3.3.3. □ 

There may be a temptation to consider closed intervals as being a kind of 
standard archetype for compact sets, but this is misleading. The structure of 
compact sets can be much more intricate and interesting. For instance, one 
implication of Theorem 3.3.4 is that the Cantor set is compact. It is more 
useful to think of compact sets as generalizations of closed intervals. Whenever 
a fact involving closed intervals is true, it is often the case that the same result 
holds when we replace “closed interval” with “compact set.” As an example, 
let’s experiment with the Nested Interval Property proved in the first chapter. 

Theorem 3.3.5 (Nested Compact Set Property). If 


K x D K 2 D K 3 D K a D • • • 


is a nested sequence of nonempty compact sets , then the intersection fT=l Kn 
is not empty. 

Proof. In order to take advantage of the compactness of each iF n , we are going 
to produce a sequence that is eventually in each of these sets. Thus, for each 
n E N, pick a point x n E K n . Because the compact sets are nested, it follows 
that the sequence (x n ) is contained in K\. By Definition 3.3.1, (x n ) has a 
convergent subsequence (x nk ) whose limit x = limx nfc is an element of K\. 

In fact, x is an element of every K n for essentially the same reason. Given 
a particular no G N, the terms in the sequence (x n ) are contained in K no as 
long as n > no- Ignoring the finite number of terms for which n & < no, the 
same subsequence (x nk ) is then also contained in K no . The conclusion is that 
x = limx nk is an element of K no . Because no was arbitrary, x E fX?=i K n and 
the proof is complete. □ 

Open Covers 

Defining compactness for sets in R is reminiscent of the situation we encountered 
with completeness in that there are a number of equivalent ways to describe this 
phenomenon. We demonstrated the equivalence of two such characterizations 
in Theorem 3.3.4. What this theorem implies is that we could have decided to 
define compact sets to be sets that are closed and bounded, and then proved that 
sequences contained in compact sets have convergent subsequences with limits 
in the set. There are some larger issues involved in deciding what the definition 
should be, but what is important at this moment is that we be versatile enough 
to use whatever description of compactness is most appropriate for a given 
situation. 

Although Theorem 3.3.4 is sufficient for most of our purposes, there is a 
third important characterization of compactness, equivalent to the two others, 
which is described in terms of open covers and finite subcovers. 
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Definition 3.3.6. Let A C R. An open cover for A is a (possibly infinite) 
collection of open sets {Oa : A E A} whose union contains the set A ; that is, 
A C Uaca°a. Given an open cover for A , a finite subcover is a finite sub- 
collection of open sets from the original open cover whose union still manages 
to completely contain A. 

Example 3.3.7. Consider the open interval (0,1). For each point x E (0,1), 
let O x be the open interval (x/2, 1). Taken together, the infinite collection 
{O x : x E (0,1)} forms an open cover for the open interval (0,1). Notice, 
however, that it is impossible to find a finite subcover. Given any proposed 
finite subcollection 

{G CCl , O x 2 , . . . , 0 Xri } , 

set x' = min{#i, # 2 , . . . ,# n } and observe that any real number y satisfying 
0 < y < x' / 2 is not contained in the union (JlLi 









k — . — . — . . — \ 

a (To X 1 -• 

0 2 X2 2 Xi 1 


V. 







Now, consider a similar cover for the closed interval [0,1]. For x E (0,1), 
the sets O x = (x/2, 1) do a fine job covering (0, 1), but in order to have an open 
cover of the closed interval [0, 1], we must also cover the endpoints. To remedy 
this, we could fix e > o, and let Oq — ( — e, e) and 0\ — (1 — e, 1 H - e). Then, the 
collection 

(G 0 , Oi,O x : x E (0, 1)} 

is an open cover for [0,1]. But this time, notice there is a finite subcover. 
Because of the addition of the set O o, we can choose x' so that x' / 2 < e. It 
follows that {Oo, O x /, Oi} is a finite subcover for the closed interval [0, 1]. 

Theorem 3.3.8 (Heine— Borel Theorem). Let K be a subset of R. All of 

the following statements are equivalent in the sense that any one of them implies 
the two others: 

(i) K is compact. 

(ii) K is closed and bounded. 

(iii) Every open cover for K has a finite subcover. 

Proof. The equivalence of (i) and (ii) is the content of Theorem 3.3.4. What 
remains is to show that (iii) is equivalent to (i) and (ii). Let’s first assume (iii), 
and prove that it implies (ii) (and thus (i) as well). 
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To show that K is bounded, we construct an open cover for K by defining 
O x to be an open interval of radius 1 around each point x E K. In the language 
of neighborhoods, O x = V\(x). The open cover {O x : x E K} then must have 
a finite subcover {0 Xl , 0 X2 , . . . , 0 Xn }. Because K is contained in a finite union 
of bounded sets, K must itself be bounded. 

The proof that K is closed is more delicate, and we argue it by contradiction. 
Let (y n ) be a Cauchy sequence contained in K with lim y n = y. To show that 
K is closed, we must demonstrate that y E K, so assume for contradiction that 
this is not the case. If y £ K, then every x E K is some positive distance away 
from y. We now construct an open cover by taking O x to be an interval of radius 
x — y |/2 around each point x in K. Because we are assuming (iii), the resulting 
open cover {O x : x E K} must have a finite subcover {0 Xl , O x 2 , . . . , 0 Xn }. The 
contradiction arises when we realize that, in the spirit of Example 3.3.7, this 
finite subcover cannot contain all of the elements of the sequence (y n ). To make 
this explicit, set 


eo 


mm 


x * 


y 


1 < i < n 


Because (y n ) y , we can certainly find a term y n satisfying | y^ —y 

such a yjsr must necessarily be excluded from each 0 Xi , meaning that 


< cq. But 


n 

VN £ U Oxi • 

i=l 


Thus our supposed subcover does not actually cover all of K. This contradiction 
implies that y E iL, and hence K is closed and bounded. 

The proof that (ii) implies (iii) is outlined in Exercise 3.3.9. To be historically 
accurate, it is this particular implication that is most appropriately referred to 
as the Heine-Borel Theorem. □ 


Exercises 

Exercise 3.3.1. Show that if K is compact and nonempty, then supiL and 
inf K both exist and are elements of K. 

Exercise 3.3.2. Decide which of the following sets are compact. For those that 
are not compact, show how Definition 3.3.1 breaks down. In other words, give 
an example of a sequence contained in the given set that does not possess a 
subsequence converging to a limit in the set. 


(a) 

N. 



(b) 

Qn [o, l]. 



( c ) 

The Cantor 

set. 


(d) 

{1 + 1/2 2 + 

1/3 2 + • • 

• + 1/n 2 : n <G N}. 

(e) 

{M/2, 2/3, 

3/4, 4/5,. 
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Exercise 3.3.3. Prove the converse of Theorem 3.3.4 by showing that if a set 
K C R is closed and bounded, then it is compact. 

Exercise 3.3.4. Assume K is compact and F is closed. Decide if the following 
sets are definitely compact, definitely closed, both, or neither. 

(a) K H F 

(b) F c U K c 

(c) K\F = {xeK :x(£F} 

(d) K H F c 

Exercise 3.3.5. Decide whether the following propositions are true or false. 
If the claim is valid, supply a short proof, and if the claim is false, provide a 
counterexample . 

(a) The arbitrary intersection of compact sets is compact. 

(b) The arbitrary union of compact sets is compact. 

(c) Let A be arbitrary, and let K be compact. Then, the intersection An K 
is compact. 

(d) If Fi D F 2 D F 3 D F 4 D • • • is a nested sequence of nonempty closed sets, 
then the intersection H^Li 7 ^ 0 - 

Exercise 3.3.6. This exercise is meant to illustrate the point made in the 
opening paragraph to Section 3.3. Verify that the following three statements 
are true if every blank is filled in with the word “finite.” Which are true if every 
blank is filled in with the word “compact.” Which are true if every blank is 
filled in with the word “closed.” 

(a) Every set has a maximum. 

(b) If A and B are , then 4 + 5 = + B} is also . 

(c) If {A n : n E N} is a collection of sets with the property that 

every finite subcollection has a nonempty intersection, then fXi=i A n is 
nonempty as well. 

Exercise 3.3.7. As some more evidence of the surprising nature of the Cantor 
set, follow these steps to show that the sum C -\-C = {x + y : x,y E C} is equal 
to the closed interval [ 0 , 2 ]. (Keep in mind that C has zero length and contains 
no intervals.) 

Because C C [0,1], C + C C [0,2], so we only need to prove the reverse 
inclusion [0,2] C {x + y : x, y E C}. Thus, given s E [0,2], we must find two 
elements x,y E C satisfying x + y = s. 

(a) Show that there exist x\,yi E C\ for which x\ + y\ = s. Show in general 
that, for an arbitrary n E N, we can always find x n ,y n E C n for which 
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(b) Keeping in mind that the sequences (x n ) and (y n ) do not necessarily 
converge, show how they can nevertheless be used to produce the desired 
x and y in C satisfying x + y = s. 

Exercise 3.3.8. Let K and L be nonempty compact sets, and define 


d = inf {|a; 


y 


: x E K and y E L}. 


This turns out to be a reasonable definition for the distance between K and L. 


(a) If K and L are disjoint, show d > 0 and that d = \xq — yo\ for some xq E K 
and yo E L. 


(b) Show that it’s possible to have d 
sets K and L are closed. 


0 if we assume only that the disjoint 


Exercise 3.3.9. Follow these steps to prove the final implication in Theo- 
rem 3.3.8. 

Assume K satisfies (i) and (ii), and let {Oa : A E A} be an open cover for 
K. For contradiction, let’s assume that no finite subcover exists. Let Iq be a 
closed interval containing K. 



Show that there exists a nested sequence of closed intervals b 3 5 

• • • with the property that, for each n, I n D K cannot be finitely covered 
and lim I n =0. 


(b) Argue that there exists an x E K such that x E I n for all n. 

(c) Because x E Ff , there must exist an open set O\ 0 from the original collec- 
tion that contains x as an element. Explain how this leads to the desired 
contradiction. 


Exercise 3.3.10. Here is an alternate proof to the one given in Exercise 3.3.9 
for the final implication in the Heine-Borel Theorem. 

Consider the special case where K is a closed interval. Let {Oa : A E A} be 
an open cover for [a, b } and define S to be the set of all x E [a, b] such that [a, x 
has a finite subcover from {Oa • A E A}. 

(a) Argue that S is nonempty and bounded, and thus s = sup S exists. 

(b) Now show 8 = 6, which implies [a, b] has a finite subcover. 


(c) Finally, prove the theorem for an arbitrary closed and bounded set K. 

Exercise 3.3.11. Consider each of the sets listed in Exercise 3.3.2. For each 
one that is not compact, find an open cover for which there is no finite subcover. 

Exercise 3.3.12. Using the concept of open covers (and explicitly avoiding 
the Bolzano- Weierstrass Theorem), prove that every bounded infinite set has a 
limit point. 

Exercise 3.3.13. Let’s call a set clompact if it has the property that every 
closed cover (i.e. , a cover consisting of closed sets) admits a finite subcover. 
Describe all of the clompact subsets of R. 
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3.4 Perfect Sets and Connected Sets 

One of the underlying goals of topology is to strip away all of the extraneous 
information that comes with our intuitive picture of the real numbers and isolate 
just those properties that are responsible for the phenomenon we are studying. 
For example, we were quick to observe that any closed interval is a compact 
set. The content of Theorem 3.3.4, however, is that the compactness of a closed 
interval has nothing to do with the fact that the set is an interval but is a 
consequence of the set being bounded and closed. In Chapter 1, we argued that 
the set of real numbers between 0 and 1 is an uncountable set. This turns out to 
be the case for any nonempty closed set that does not contain isolated points. 

Perfect Sets 

Definition 3.4.1. A set P C R is perfect if it is closed and contains no isolated 
points. 

Closed intervals (other than the singleton sets [a, a]) serve as the most 
obvious class of perfect sets, but there are more interesting examples. 

Example 3.4.2 (Cantor Set). It is not too hard to see that the Cantor set is 
perfect. In Section 3.1, we defined the Cantor set as the intersection 

oo 

C=f)C n , 

n = 0 

where each C n is a finite union of closed intervals. By Theorem 3.2.14, each C n 
is closed, and by the same theorem, C is closed as well. It remains to show that 
no point in C is isolated. 

Let x G C be arbitrary. To convince ourselves that x is not isolated, we must 
construct a sequence (x n ) of points in C, different from x, that converges to x. 
From our earlier discussion, we know that C at least contains the endpoints of 
the intervals that make up each C n . In Exercise 3.4.3, we sketch the argument 
that these are all that is needed to construct (x n ). 

One argument for the uncountability of the Cantor set was presented in 
Section 3.1. Another, perhaps more satisfying, argument for the same conclusion 
can be obtained from the next theorem. 

Theorem 3.4.3. A nonempty perfect set is uncountable. 

Proof. If P is perfect and nonempty, then it must be infinite because otherwise 
it would consist only of isolated points. Let’s assume, for contradiction, that P 
is countable. Thus, we can write 

P * * • } 5 

where every element of P appears on this list. The idea is to construct a 
sequence of nested compact sets iL n , all contained in P, with the property that 
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x\ i k 2 , x 2 i k 3 , xs K 4 , .... Some care must be taken to ensure that each 
K n is nonempty, for then we can use Theorem 3.3.5 to produce an 

oo 

x e P| K n c p 

n = 1 


that cannot be on the list {aq, x 2 , # 3 , . . 

Let I\ be a closed interval that contains x\ in its interior (i.e., x\ is not an 
endpoint of I\). Now, x\ is not isolated, so there exists some other point y 2 G P 
that is also in the interior of I\. Construct a closed interval I 2 , centered on 2/2 ? 
so that I 2 Q h but x\ £ L 2 . More explicitly, if I\ — [a, 6], let 


e = mm{y 2 -a,b- y 2 , \xi - y 2 1}. 


Then, the interval I 2 = [y 2 — e/2,y 2 + e/2] has the desired properties. 



This process can be continued. Because y 2 E P is not isolated, there must exist 
another point 2/3 E P in the interior of I 2 , and we may insist that y$ 7 ^ x 2 . 
Now, construct I 3 centered on y 3 and small enough so that x 2 £ 1$ and I 3 C I 2 . 
Observe that 1% D P 7 ^ 0 because this intersection contains at least 7 / 3 . 

If we carry out this construction inductively, the result is a sequence of closed 
intervals I n satisfying 

(i) In -\- 1 — Im 

(ii) x n g I n + 1 , and 
(iu) i n n -P 7 ^ 0 . 

To finish the proof, we let K n = I n D P. For each n £ N, we have that K n is 
closed because it is the intersection of closed sets, and bounded because it is 
contained in the bounded set I n . Hence, K n is compact. By construction, K n 
is not empty and iL n +i Q Kn- Thus, we can employ the Nested Compact Set 
Property (Theorem 3.3.5) to conclude that the intersection 

00 

n k u 7 ^ 0 . 

n = 1 

But each K n is a subset of P, and the fact that x n 0 7 n+ 1 leads to the conclusion 
that f|~i K n = 0 , which is the sought-after contradiction. □ 
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Connected Sets 

Although the two open intervals (1,2) and (2,5) have the limit point x = 2 in 
common, there is still some space between them in the sense that no limit point 
of one of these intervals is actually contained in the other. Said another way, 
the closure of (1, 2) (see Definition 3.2.11) is disjoint from (2, 5), and the closure 
of (2,5) does not intersect (1,2). Notice that this same observation cannot be 
made about (1,2] and (2,5), even though these latter sets are disjoint. 

Definition 3.4.4. Two nonempty sets A, B C R are separated if A D B and 
A H B are both empty. A set E C R is disconnected if it can be written as 
E = AU B , where A and B are nonempty separated sets. 

A set that is not disconnected is called a connected set. 

Example 3.4.5. (i) If we let A = (1, 2) and B = (2, 5), then it is not difficult 

to verify that E = (1,2) U (2,5) is disconnected. Notice that the sets 
C = (1,2] and D = (2,5) are not separated because C D D = {2} is 
not empty. This should be comforting. The union C U D is equal to the 
interval (1,5), which better not qualify as a disconnected set. We will 
prove in a moment that every interval is a connected subset of R and vice 
versa. 

(ii) Let’s show that the set of rational numbers is disconnected. If we let 

A = Q D (—oo, V2) and B = Q n (V% 00 ), 

then we certainly have Q = A U B. The fact that A C (— 00 , y/2) implies 
(by the Order Limit Theorem) that any limit point of A will necessarily 
fall in ( — 00 , \/2]. Because this is disjoint from B , we get A D B = 0. 
We can similarly show that A P\ B = 0, which implies that A and B are 
separated. 

The definition of connected is stated as the negation of disconnected, but a 
little care with the logical negation of the quantifiers in Definition 3.4.4 results 
in a positive characterization of connectedness. Essentially, a set E is connected 
if, no matter how it is partitioned into two nonempty disjoint sets, it is always 
possible to show that at least one of the sets contains a limit point of the other. 

Theorem 3.4.6. A set E C R is connected if and only if, for all nonempty 
disjoint sets A and B satisfying E = AU B, there always exists a convergent 
sequence (x n ) —> x with (x n ) contained in one of A or B, and x an element of 
the other. 

Proof. Exercise 3.4.6. □ 

The concept of connectedness is more relevant when working with subsets 
of the plane and other higher-dimensional spaces. This is because, in R, the 
connected sets coincide precisely with the collection of intervals (with the un- 
derstanding that unbounded intervals such as (— 00 , 3) and [0, 00 ) are included). 
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Theorem 3.4.7. A set E C R is connected if and only if whenever a < c < b 
with a, b E E, it follows that c E E as well. 

Proof. Assume E is connected, and let a, b E E and a < c < b. Set 


A = (—oo, c) n E and B = (c, oo) D E. 


Because a <E A and b E 7>, neither set is empty and, just as in Example 3.4.5 
(ii), neither set contains a limit point of the other. If E = 4U5, then we would 
have that E is disconnected, which it is not. It must then be that A U B is 
missing some element of E, and c is the only possibility. Thus, cGE 

Conversely, assume that E is an interval in the sense that whenever a, b E E 
satisfy a < c < b for some c, then cGE Our intent is to use the characterization 
of connected sets in Theorem 3.4.6, so let E = A U 7>, where A and B are 
nonempty and disjoint. We need to show that one of these sets contains a limit 
point of the other. Pick ao E A and bo E 7>, and, for the sake of the argument, 
assume ao < bo- Because E is itself an interval, the interval Iq = [ao,bo] is 
contained in E. Now, bisect 7 0 info two equal halves. The midpoint of 7 0 must 
either be in A or 7>, and so choose 7i = [ai, b{\ to be the half that allows us to 
have a\ E A and b\ E B. Continuing this process yields a sequence of nested 
intervals I n = 


a n ,b n ], where a n E A, b n E 5, and the length (b n — a n ) — 0. 
The remainder of this argument should feel familiar. By the Nested Interval 
Property, there exists an 


oo 


x E 


n in ’ 


n — 0 


and it is straightforward to show that the sequences of endpoints each satisfy 
lima n = x and limb n = x. But now x E E must belong to either A or B , thus 
making it a limit point of the other. This completes the argument. □ 


Exercises 


Exercise 3.4.1. If P is a perfect set and K is compact, is the intersection PnK 
always compact? Always perfect? 

Exercise 3.4.2. Does there exist a perfect set consisting of only rational num- 
bers? 


Exercise 3.4.3. Review the portion of the proof given for Theorem 3.4.2 and 
follow these steps to complete the argument. 



Because x E Ci, 
satisfying \x — x\ 


argue that there exists an x\ E C D C\ with x\ ^ x 
< 1/3. 



Finish the proof by showing that for each n E N, there exists x n E CC\C> 
different from x, satisfying \x — x n \ < l/3 n . 


Exercise 3.4.4. Repeat the Cantor construction from Section 3.1 starting with 
the interval [0, 1]. This time, however, remove the open middle fourth from each 
component. 
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(a) Is the resulting set compact? Perfect? 

(b) Using the algorithms from Section 3.1, compute the length and dimension 
of this Cantor-like set. 

Exercise 3.4.5. Let A and B be nonempty subsets of R. Show that if there 
exist disjoint open sets U and V with A C U and B C V, then A and B are 
separated. 

Exercise 3.4.6. Prove Theorem 3.4.6. 

Exercise 3.4.7. A set E is totally disconnected if, given any two distinct points 
x, y G E, there exist separated sets A and B with x E A, y E L>, and E = AU B. 

(a) Show that Q is totally disconnected. 

(b) Is the set of irrational numbers totally disconnected? 

Exercise 3.4.8. Follow these steps to show that the Cantor set is totally dis- 
connected in the sense described in Exercise 3.4.7. 

Let C = fT=o Cm as defined in Section 3.1. 

(a) Given x, y E C, with x < y, set e = y — x. For each n = 0, 1, 2, . . ., the 
set C n consists of a finite number of closed intervals. Explain why there 
must exist an N large enough so that it is impossible for x and y both to 
belong to the same closed interval of Cjv- 

(b) Show that C is totally disconnected. 

Exercise 3.4.9. Let {ri, 7 * 2 , 7 * 3 , . . .} be an enumeration of the rational numbers, 
and for each n E N set e n = l/2 n . Define O = U^Li U £n (r n ), and let F = O c . 

(a) Argue that F is a closed, nonempty set consisting only of irrational 
numbers. 

(b) Does F contain any nonempty open intervals? Is F totally disconnected? 
(See Exercise 3.4.7 for the definition.) 

(c) Is it possible to know whether F is perfect? If not, can we modify this 
construction to produce a nonempty perfect set of irrational numbers? 

3.5 Baire’s Theorem 

The nature of the real line can be deceptively elusive. The closer we look, the 
more intricate and enigmatic R becomes, and the more we are reminded to pro- 
ceed carefully (i.e., axiomatically) with all of our conclusions about properties 
of subsets of R. The structure of open sets is fairly straightforward. Every open 
set is either a finite or countable union of open intervals. Standing in opposition 
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to this tidy description of all open sets is the Cantor set. The Cantor set is a 
closed, uncountable set that contains no intervals of any kind. Thus, no such 
characterization of closed sets should be anticipated. 

Recall that the arbitrary union of open sets is always an open set. Likewise, 
the arbitrary intersection of closed sets is closed. By taking unions of closed sets 
or intersections of open sets, however, it is possible to obtain a new selection of 
subsets of R. 

Definition 3.5.1. A set A C R is called an F a set if it can be written as the 
countable union of closed sets. A set B C R is called a Gs set if it can be 
written as the countable intersection of open sets. 

Exercise 3.5.1. Argue that a set A is a Gs set if and only if its complement is 
an F(j set. 


Exercise 3.5.2. Replace each 


with the word finite or countable , 


depending on which is more appropriate. 

(a) The union of F a sets is an F a set. 

(b) The intersection of F a sets is an F a set. 

(c) The union of Gs sets is a Gs set. 

(d) The intersection of Gs sets is a Gs set. 

Exercise 3.5.3. (This exercise has already appeared as Exercise 3.2.15.) 

(a) Show that a closed interval [a, b] is a Gs set. 


(b) Show that the half-open interval (a, b] is both a Gs and an F a set. 


(c) Show that Q is an F a set, and the set of irrationals I forms a Gs set. 

It is not readily obvious that the class F a does not include every subset of 
R, but we are now ready to argue that I is not an F a set (and consequently 
Q is not a Gs set). This will follow from a theorem due to Rene Louis Baire 
(1874-1932). 

Recall that a set G C R is dense in R if, given any two real numbers a < 6, 
it is possible to find a point x G G with a < x < b. 

Theorem 3.5.2. If {G\ , G* 2 , G* 3 , . . .} is a countable collection of dense, open 
sets, then the intersection H^Li Gn is not empty. 

Proof. Before embarking on the proof, notice that we have seen a conclusion 
like this before. Theorem 3.3.5 asserts that a nested sequence of compact sets 
has a nontrivial intersection. In this theorem, we are dealing with dense, open 
sets, but as it turns out, we are going to use Theorem 3.3.5 — and actually, just 
the Nested Interval Property — as the crucial step in the argument. 


Exercise 3.5.4. Starting with n = 1, inductively construct a nested sequence 
of closed intervals I\ B I 2 D J 3 D • • • satisfying I n C G n . Give special attention 
to the issue of the endpoints of each I n . Show how this leads to a proof of the 
theorem. □ 
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Exercise 3.5.5. Show that it is impossible to write 

oo 

R = U F n , 

n — 1 

where for each n E N, F n is a closed set containing no nonempty open intervals. 

Exercise 3.5.6. Show how the previous exercise implies that the set I of 
irrationals cannot be an F a set, and Q cannot be a G$ set. 

Exercise 3.5.7. Using Exercise 3.5.6 and versions of the statements in 
Exercise 3.5.2, construct a set that is neither in F a nor in G$. 

Nowhere-Dense Sets 

We have encountered several equivalent ways to assert that a particular set G 
is dense in R. In Section 3.2, we observed that G is dense in R if and only if 
every point of R is a limit point of G. Because the closure of any set is obtained 
by taking the union of the set and its limit points, we have that 

G is dense in R if and only if G = R. 

The set Q is dense in R; the set Z is clearly not. In fact, in the jargon of 
analysis, Z is nowhere-dense in R. 

Definition 3.5.3. A set E is nowhere-dense if E contains no nonempty open 
intervals. 

Exercise 3.5.8. Show that a set E is nowhere-dense in R if and only if the 
complement of E is dense in R. 

Exercise 3.5.9. Decide whether the following sets are dense in R, nowhere- 
dense in R, or somewhere in between. 

(a) A = Qn [0,5]. 

(b) B = {1 jn : n E N}. 

(c) the set of irrationals. 

(d) the Cantor set. 

We can now restate Theorem 3.5.2 in a slightly more general form. 

Theorem 3.5.4 (Baire’s Theorem). The set of real numbers R cannot be 
written as the countable union of nowhere-dense sets. 

Proof. For contradiction, assume that Ei,E 2 ,E 3 ,... are each nowhere-dense 
and satisfy R = IJ^Li E n - 


Exercise 3.5.10. Finish the proof by finding a contradiction to the results in 
this section. □ 
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Baire’s Theorem is yet another statement about the size of R. We have 
already encountered several ways to describe the sizes of infinite sets. In terms 
of cardinality, countable sets are relatively small whereas uncountable sets are 
large. We also briefly discussed the concept of “length,” or “measure,” in 
Section 3.1. Baire’s Theorem offers a third perspective. From this point of 
view, nowhere-dense sets are considered to be “thin” sets. Any set that is the 
countable union — i.e., a not very large union — of these small sets is called a 


“meager” set or a set of “first category.” A set that is not of first category is of 
“second category.” Intuitively, sets of the second category are the “fat” subsets. 
The Baire Category Theorem, as it is often called, states that R is of second 
category. 

There is a significance to the Baire Category Theorem that is difficult to 
appreciate at the moment because we are only seeing a special case of this result. 
The real numbers are an example of a complete metric space. Metric spaces are 
discussed in some detail in Section 8.2, but here is the basic idea. Given a set 
of mathematical objects such as real numbers, points in the plane or continuous 
functions defined on [0,1], a “metric” is a rule that assigns a “distance” between 
two elements in the set. In R, we have been using \x — y\ as the distance between 
the real numbers x and y. The point is that if we can create a satisfactory notion 
of “distance” on these other spaces (we will need the triangle inequality to hold, 
for instance), then the concepts of convergence, Cauchy sequences, and open 
sets, for example, can be naturally transferred over. A complete metric space is 
any set with a suitably defined metric in which Cauchy sequences have limits. 
We have spent a good deal of time discussing the fact that R is a complete 
metric space whereas Q is not. 

The Baire Category Theorem in its more general form states that any com- 
plete metric space must be too large to be the countable union of nowhere-dense 
subsets. One particularly interesting example of a complete metric space is the 
set of continuous functions defined on the interval [0, 1]. (The distance between 
two functions / and g in this space is defined to be sup \f(x) — g{x) |, where 
x G [0, 1].) Now, in this space we will see that the collection of continuous func- 
tions that are differentiable at even one point can be written as the countable 
union of nowhere-dense sets. Thus, a fascinating consequence of Baire’s Theo- 
rem in this setting is that most continuous functions do not have derivatives at 
any point. Chapter 5 concludes with a construction of one such function. This 
odd situation mirrors the roles of Q and I as subsets of R. Just as the familiar 
rational numbers constitute a minute proportion of the real line, the differen- 
tiable functions of calculus are exceedingly atypical of continuous functions in 
general. 


Chapter 4 


Functional Limits 
and Continuity 

4.1 Discussion: Examples of Dirichlet 
and Thomae 

Although it is a common practice in calculus courses to discuss continuity before 
differentiation, historically mathematicians’ attention to the concept of continu- 
ity came long after the derivative was in wide use. Pierre de Fermat (1601-1665) 
was using tangent lines to solve optimization problems as early as 1629. On the 
other hand, it was not until around 1820 that Cauchy, Bolzano, Weierstrass, and 
others began to characterize continuity in terms more rigorous than prevailing 
intuitive notions such as “unbroken curves” or “functions which have no jumps 
or gaps.” The basic reason for this two- hundred year waiting period lies in 
the fact that, for most of this time, the very notion of function did not really 
permit discontinuities. Functions were entities such as polynomials, sines, and 
cosines, always smooth and continuous over their relevant domains. The gradual 
liberation of the term function to its modern understanding — a rule associat- 
ing a unique output with a given input — was simultaneous with 19th century 
investigations into the behavior of infinite series. Extensions of the power of 
calculus were intimately tied to the ability to represent a function f(x) as a 
limit of polynomials (called a power series) or as a limit of sums of sines and 
cosines (called a trigonometric or Fourier series). A typical question for Cauchy 
and his contemporaries was whether the continuity of the limiting polynomials 
or trigonometric functions necessarily implied that the limit / would also be 
continuous. 

Sequences and series of functions are the topics of Chapter 6. What is 
relevant at this moment is that we realize why the issue of finding a rigorous 
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Figure 4.1: Dirichlet’s Function, g(x). 


definition for continuity finally made its way to the fore. Any significant progress 
on the question of whether the limit of continuous functions is continuous 
(for Cauchy and for us) necessarily depends on a definition of continuity that 
does not rely on imprecise notions such as “no holes” or “gaps.” With a math- 
ematically unambiguous definition for the limit of a sequence in hand, we are 
well on our way toward a rigorous understanding of continuity. 

Given a function / with domain A C R, we want to define continuity at a 
point c £ A to mean that if x E A is chosen near c, then f(x) will be near /(c). 
Symbolically, we will say / is continuous at c if 

lim f(x) = /(c). 

X^-C 

The problem is that, at present, we only have a definition for the limit of a 
sequence, and it is not entirely clear what is meant by lim X ^ c f(x). The sub- 
tleties that arise as we try to fashion such a definition are well-illustrated via a 
family of examples, all based on an idea of the prominent German mathemati- 
cian, Peter Lejeune Dirichlet. Dirichlet’s idea was to define a function g in a 
piecewise manner based on whether or not the input variable x is rational or 
irrational. Specifically, let 


a( x ) = f 1 ifl£ Q 

9[ ’ \ 0 if x i Q. 

The intricate way that Q and I fit inside of R makes an accurate graph of g 
technically impossible to draw, but Figure 4.1 illustrates the basic idea. 

Does it make sense to attach a value to the expression lim x ^i /2 9{ x )^ One 
idea is to consider a sequence (x n ) 1/2. Using our notion of the limit of 
a sequence, we might try to define hm x ^i/ 2 g(x) as simply the limit of the 
sequence g(x n ). But notice that this limit depends on how the sequence (x n ) is 
chosen. If each x n is rational, then 


lim g(x n ) = 1. 

n— 1-00 


On the other hand, if x n is irrational for each n, then 


lim g(x n ) = 0. 

n— 7*00 
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This unacceptable situation demands that we work harder on our definition of 
functional limits. Generally speaking, we want the value of lim x -> c g(x) to be 
independent of how we approach c. In this particular case, the definition of a 
functional limit that we agree on should lead to the conclusion that 


lim 

x^l/2 



does not exist. 


Postponing the search for formal definitions for the moment, we should 
nonetheless realize that Dirichlet’s function is not continuous at c— 1/2. In fact, 
the real significance of this function is that there is nothing unique about the 
point c = 1/2. Because both Q and I (the set of irrationals) are dense in the 
real line, it follows that for any z G R we can find sequences (x n ) C Q and 
(y n ) C I such that 

lim x n = lim y n = z. 

(See Example 3.2.9 (hi).) Because 

lim g(x n ) ± lim g(y n ), 

the same line of reasoning reveals that g{x) is not continuous at z. In the jargon 
of analysis, Dirichlet’s function is a nowhere- continuous function on R. 

What happens if we adjust the definition of g(pc) in the following way? Define 
a new function h (Fig. 4.2) on R by setting 

h(x) = { x if * G Q 

{ ’ \ 0 if X i Q. 

If we take c different from zero, then just as before we can construct sequences 
(x n ) c of rationals and (y n ) ^ c of irrationals so that 

lim h(x n ) = c and lim h(y n ) = 0. 

Thus, h is not continuous at every point c / 0. 
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If c = 0, however, then these two limits are both equal to h( 0) = 0. In fact, 
it appears as though no matter how we construct a sequence (z n ) converging to 
zero, it will always be the case that lim h(z n ) = 0. This observation goes to the 
heart of what we want functional limits to entail. To assert that 

lim h(x) = L 

X^rC 


should imply that 


h(z n ) — > L for all sequences (z n ) —?> c. 

For reasons not yet apparent, it is beneficial to fashion the definition for func- 
tional limits in terms of neighborhoods constructed around c and L. We will 
quickly see, however, that this topological formulation is equivalent to the 
sequential characterization we have arrived at here. 

To this point, we have been discussing continuity of a function at a particular 
point in its domain. This is a significant departure from thinking of continuous 
functions as curves that can be drawn without lifting the pen from the paper, 
and it leads to some fascinating questions. In 1875, K.J. Thomae discovered the 
function 


I 1 if x = 0 

t(x) = < 1/n if x = m/n £ Q\{0} is in lowest terms with n > 0 

[ 0 if x ^ Q. 

If c G Q, then t(c) > 0. Because the set of irrationals is dense in R, we can find 

a sequence (y n ) in I converging to c. The result is that 

lim t(y n ) = 0 ^ £(c), 

and Thomae’s function (Fig. 4.3) fails to be continuous at any rational point. 

The twist comes when we try this argument on some irrational point in the 
domain such as c = y/2. All irrational values get mapped to zero by t, so the 
natural thing would be to consider a sequence (x n ) of rational numbers that 
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converges to \/2. Now, a/2 ~ 1.414213..., so a good start on a particular 
sequence of rational approximations for a/2 might be 

/ 14 141 1414 14142 141421 \ 

V ’ To’ Too’ iooo’ ioooo’ iooooo’ "7 ’ 

But notice that the denominators of these fractions are getting larger. In this 
case, the sequence t(x n ) begins, 

/ 1 i i i 1 \ 

v’ 5’ 100’ 500’ 5000’ 100000’ "7 

and is fast approaching 0 = t(y/ 2). We will see that this always happens. 
The closer a rational number is chosen to a fixed irrational number, the larger 
its denominator must necessarily be. As a consequence, Thomae’s function has 
the bizarre property of being continuous at every irrational point on R and 
discontinuous at every rational point. 

Is there an example of a function with the opposite property? In other words, 
does there exist a function defined on all of R that is continuous on Q but fails 
to be continuous on I? Can the set of discontinuities of a particular function be 
arbitrary? If we are given some set A C R, is it always possible to find a function 
that is continuous only on the set A C 1 In each of the examples in this section, the 
functions were defined to have erratic oscillations around points in the domain. 
What conclusions can we draw if we restrict our attention to functions that 
are somewhat less volatile? One such class is the set of so-called monotone 
functions, which are either increasing or decreasing on a given domain. What 
might we be able to say about the set of discontinuities of a monotone function 
on R? 

4.2 Functional Limits 

Consider a function f : A R. Recall that a limit point c of A is a point with 
the property that every e-neighborhood V e (c ) intersects A in some point other 
than c. Equivalently, c is a limit point of A if and only if c = limx n for some 
sequence (x n ) C A with x n / c. It is important to remember that limit points 
of A do not necessarily belong to the set A unless A is closed. 

If c is a limit point of the domain of /, then, intuitively, the statement 

lim f(x) = L 

X^rC 

is intended to convey that values of f(x) get arbitrarily close to L as x is chosen 
closer and closer to c. The issue of what happens when x = c is irrelevant from 
the point of view of functional limits. In fact, c need not even be in the domain 
of /. 

The structure of the definition of functional limits follows the “challenge- 
response” pattern established in the definition for the limit of a sequence. Recall 
that given a sequence (a n ), the assertion lima n = L implies that for every 
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Figure 4.4: Definition of Functional Limit, 


e-neighborhood V e (L) centered at L, there is a point in the sequence — call it 
a^v — after which all of the terms a n fall in V e (L). Each e- neighborhood repre- 
sents a particular challenge, and each N is the respective response. For func- 
tional limit statements such as lim x ^ c f(x) = L, the challenges are still made in 
the form of an arbitrary e-neighborhood around L, but the response this time 
is a (^-neighborhood centered at c. 

Definition 4.2.1 (Functional Limit). Let f : A R, and let c be a limit 

point of the domain A. We say that linu^c f(x) = L provided that, for all 


e > 0, there exists a 5 > 0 such that whenever 0 < 
follows that | f(x) — L\ < e. 


x 


c\ < 5 (and x E A) it 


This is often referred to as the U e-S version” of the definition for functional 
limits. Recall that the statement 

I f{x)-L\ <e is equivalent to f(x) E V e (L). 

Likewise, the statement 

x — c\ < S is satisfied if and only if x E Vs(c). 

The additional restriction 0 < \x — c\ is just an economical way of saying x ^ c. 
Recasting Definition 4.2.1 in terms of neighborhoods — just as we did for the 
definition of convergence of a sequence in Section 2.2 — amounts to little more 
than a change of notation, but it does help emphasize the geometrical nature of 
what is happening (Fig. 4.4). 

Definition 4.2. IB (Functional Limit: Topological Version). Let c be a 

limit point of the domain of / : A -A R. We say li m x ^ c f(x) = L provided 
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that, for every e-neighborhood V e (L) of L, there exists a (^-neighborhood Vs(c) 
around c with the property that for all x G Vs(c) different from c (with x G A) 
it follows that f{pc) G V e (L). 

The parenthetical reminder “(x G A )” present in both versions of the def- 
inition is included to ensure that x is an allowable input for the function in 
question. When no confusion is likely, we may omit this reminder with the 
understanding that the appearance of f(x) carries with it the implicit assump- 
tion that x is in the domain of /. On a related note, there is no reason to discuss 
functional limits at isolated points of the domain. Thus, functional limits will 
only be considered as x tends toward a limit point of the function’s domain. 

Example 4.2.2. (i) To familiarize ourselves with Definition 4.2.1, let’s prove 

that if f(x) = 3x + 1, then 


lim f(x) = 7. 

x^-2 


Let e > 0 
0 < x — 2 


Definition 4.2.1 requires that we produce a S > 0 so 
< S leads to the conclusion | f(x) — 7\ < e. Notice that 


that 


f{x) - 7 


(3x + 1) 





Thus, if we choose S = e/3, then 0 < 
3 (e/3) = e. 


x 


2 


< S implies \f(x) 


7 < 


(ii) Let’s show 

lim g(x) = 4, 

cc— 

where g(x) = x 2 . Given an arbitrary e > 0, our goal this time is to make 
| g(x) — 4 1 < e by restricting \x — 2| to be smaller than some carefully 
chosen 5. As in the previous problem, a little algebra reveals 


g(x) -4 



4 


x + 2 


x 



We can make \x — 2| as small as we like, but we need an upper bound on 
x + 2 1 in order to know how small to choose S. The presence of the variable 
x causes some initial confusion, but keep in mind that we are discussing 
the limit as x approaches 2. If we agree that our ^-neighborhood around 
c — 2 must have radius no bigger than 5 = 1, then we get the upper bound 
x + 2 1 < 1 3 + 2 1 = 5 for all x G Vs(c). 

Now, choose 5 = min{l,e/5}. If 0 < \x — 2| <5, then it follows that 


x 2 — 4 


x + 2\\x — 2| < (5)- = e. 

5 


and the limit is proved. 
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Sequential Criterion for Functional Limits 

We worked very hard in Chapter 2 to derive an impressive list of proper- 
ties enjoyed by sequential limits. In particular, the Algebraic Limit Theorem 
(Theorem 2.3.3) and the Order Limit Theorem (Theorem 2.3.4) proved invalu- 
able in a large number of the arguments that followed. Not surprisingly, we 
are going to need analogous statements for functional limits. Although it is not 
difficult to generate independent proofs for these statements, all of them will 
follow quite naturally from their sequential analogs once we derive the sequen- 
tial criterion for functional limits motivated in the opening discussion of this 
chapter. 

Theorem 4.2.3 (Sequential Criterion for Functional Limits). Given a 
function f : A R and a limit point c of A, the following two statements are 
equivalent: 

(i) lim f(pc) = L. 

x^-c 

(ii) For all sequences (x n ) C A satisfying x n f^c and (x n ) c, it follows that 

f {% n ) L. 

Proof. (=>) Let’s first assume that lim X ^ c f{x) = L. To prove (ii), we consider 
an arbitrary sequence (x n ), which converges to c and satisfies x n 7 ^ c. Our goal 
is to show that the image sequence f(x n ) converges to L. This is most easily 
seen using the topological formulation of the definition. 

Let e 0. Because we are assuming (i), Definition 4.2. IB implies that 
there exists V$(c) with the property that all x G Vs(c) different from c satisfy 
f(x) G V e (L). All we need to do then is argue that our particular sequence (x n ) 
is eventually in V§(c). But we are assuming that (x n ) —> c. This implies that 
there exists a point x n after which x n G V$(c). It follows that n > N implies 
f(x n ) G F e (L), as desired. 

(4=) For this implication we give a contrapositive proof, which is essentially 
a proof by contradiction. Thus, we assume that statement (ii) is true, and 
carefully negate statement (i). To say that 


lim f{x) 7 ^ L 


X^rC 


means that there exists at least one particular eo > 0 for which no S is a suitable 
response. In other words, no matter what S > 0 we try, there will always be at 
least one point 


x G Vs(c) with x =fz c for which f[x) ^ V eo (L). 

Now consider 5 n = 1/n. From the preceding discussion, it follows that for each 
n G N we may pick an x n G Vs n (c) with x n f^c and f{x n ) £ V eo (L). But now 
notice that the result of this is a sequence (x n ) c with x n 7 ^ c, where the 
image sequence f(x n ) certainly does not converge to L. 

Because this contradicts (ii), which we are assuming is true for this argument, 
we may conclude that (i) must also hold. □ 
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Theorem 4.2.3 has several useful corollaries. In addition to the previously 
advertised benefit of granting us some short proofs of statements about how 
functional limits interact with algebraic combinations of functions, we also get 
an economical way of establishing that certain limits do not exist. 


Corollary 4.2.4 (Algebraic Limit Theorem for Functional Limits). Let 

f and g be functions defined on a domain 4CR, and assume lirm^c fix) = L 
and liny € ^ c g(x) = M for some limit point c of A. Then , 


(i) lim kf(x ) = kL for all k E R, 

x^-c 


(ii) lim [f(x) + g(x) 

X^rC 


= L + M, 


(iii) lim [f(x)g(x)\ = LM , and 


X^rC 


(iv) lim f(x)/g(x) = L/M , provided M / 0. 

x^-c 

Proof. These follow from Theorem 4.2.3 and the Algebraic Limit Theorem for 
sequences. The details are requested in Exercise 4.2.1. □ 


Corollary 4.2.5 (Divergence Criterion for Functional Limits). Let f be 

a function defined on A, and let c be a limit point of A. If there exist two 
sequences (x n ) and (y n ) in A with x n ^ c and y n ^ c and 

lim x n = lim y n = c but lim f(x n ) ^ lim f(y n ), 

then we can conclude that the functional limit lim X _> c f(x) does not exist. 

Example 4.2.6. Assuming the familiar properties of the sine function, let’s 
show that linL^o sin(l/x) does not exist (Fig. 4.5). 

If x n = l/2n7r and y n = l/(2n7r + tt/ 2) , then lim(x n ) = \im(y n ) = 0. 
However, sin(l/x n ) = 0 for all n E N while sin(l /y n ) = 1. Thus, 


lim sin(l/ x n ) ^ lim sm(l/y n ), 


so by Corollary 4.2.5, linr^o sin(l/x) does not exist. 



Figure 4.5: The function sin(l/x) near zero. 
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Exercises 


Exercise 4.2.1. (a) Supply the details for how Corollary 4.2.4 part (ii) follows 

from the Sequential Criterion for Functional Limits in Theorem 4.2.3 and 
the Algebraic Limit Theorem for sequences proved in Chapter 2. 

(b) Now, write another proof of Corollary 4.2.4 part (ii) directly from Defini- 
tion 4.2.1 without using the sequential criterion in Theorem 4.2.3. 

(c) Repeat (a) and (b) for Corollary 4.2.4 part (iii). 

Exercise 4.2.2. For each stated limit, find the largest possible ^-neighborhood 
that is a proper response to the given e challenge. 


(a) lim cc ^3(5x — 6) = 9, where e = 1 


(b) liin ^^.4 yCc = 2, where e = 1. 

(c) linr*;-^ [[x]\ = 3, where e = 1. (The function 
integer less than or equal to r.) 


x 


returns the greatest 


(d) lim 


x — Ytt 


X 


= 3, where e = .01. 


Exercise 4.2.3. Review the definition of Thomae’s function t(x) from 
Section 4.1. 


(a) Construct three different sequences (x n ), ( y n ), and ( z n ), each of which 
converges to 1 without using the number 1 as a term in the sequence. 

(b) Now, compute lim t(x n ), lim t(y n ), and lim t(z n ). 

(c) Make an educated conjecture for lim^^i t(x), and use Definition 4.2. IB to 
verify the claim. (Given e > o, consider the set of points {x E R : t(x) > e}. 
Argue that all the points in this set are isolated.) 

Exercise 4.2.4. Consider the reasonable but erroneous claim that 


lim l/\\x 
no ' LL 

(a) Find the largest <5 that represents a proper response to the challenge of 
e = 1/2. 

(b) Find the largest <5 that represents a proper response to e = 1/50. 

(c) Find the largest e challenge for which there is no suitable S response 
possible. 
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Exercise 4.2.5. Use Definition 4.2.1 to supply a proper proof for the following 
limit statements. 


(a) lim cc ^2(3x + 4) = 10. 

(b) lirn^o x 3 = 0. 

(c) lim x ^ 2 (x 2 + x - 1) = 5. 

(d) lim a; _ i .3 l/x = 1/3. 

Exercise 4.2.6. Decide if the following claims are true or false, and give short 
justifications for each conclusion. 

(a) If a particular 5 has been constructed as a suitable response to a particular 
e challenge, then any smaller positive S will also suffice. 


(b) If liim^a f(pc) = L and a happens to be in the domain of /, then L = /(a). 

(c) If lim^a f(x) = L, then lim^a 3 [f(x) - 2] 2 = 3 (L - 2) 2 . 

(d) If lim x -> a f(x) = 0, then lim x ^ a f(x)g{x) = 0 for any function g (with 
domain equal to the domain of /.) 

Exercise 4.2.7. Let g : A R and assume that / is a bounded function on A 
in the sense that there exists M > 0 satisfying |/(x)| < M for all x £ A. 

Show that if lin \ x ^ c g(x) = 0, then lim X ^ c g(x)f(x) = 0 as well. 


Exercise 4.2.8. Compute each limit or state that it does not exist. Use the 
tools developed in this section to justify each conclusion. 

(a) 


(b) lim a ._ ) . 7/ 4 


\x — 2 | 
x — 2 


(c) lim a; _ i .o(-l) II1/x11 

(d) linx^o 

Exercise 4.2.9 (Infinite Limits). The statement lim^^o l/x 2 = oo certainly 
makes intuitive sense. To construct a rigorous definition in the challenge- 
response style of Definition 4.2.1 for an infinite limit statement of this form, 
we replace the (arbitrarily small) e > 0 challenge with an (arbitrarily large) 
M > 0 challenge: 

Definition: liny^c f(x) = oo means that for all M > 0 we can find a 8 > 0 
such that whenever 0 < \x — c\ < S, it follows that f(x) > M. 


(a) Show lirm^o l/x 2 


oo in the sense described in the previous definition. 


(b) Now, construct a definition for the statement linx^oo f(x) = L. Show 
linx^oo l/x = 0. 
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(c) What would a rigorous definition for linr^oo f(x) = oo look like? Give 
an example of such a limit. 


Exercise 4.2.10 (Right and Left Limits). Introductory calculus courses 
typically refer to the right-hand limit of a function as the limit obtained by 
“letting x approach a from the right-hand side.” 

(a) Give a proper definition in the style of Definition 4.2.1 for the right-hand 
and left-hand limit statements: 


lim f(x) = L and lim f{pc) = M. 

x — x — ycl 


(b) Prove that lim^-^ f(pc) = L if and only if both the right and left-hand 
limits equal L. 

Exercise 4.2.11 (Squeeze Theorem). Let /, g, and h satisfy f(x) < g(x) < 
h(x) for all x in some common domain A. If lim x ^ c f(x) = L and lim^^ c h(x) = 
L at some limit point c of A , show lim x -> c g(x) = L as well. 


4.3 Continuous Functions 


We now come to a significant milestone in our progress toward a rigorous theory 
of real- valued functions — a proper definition of the seminal concept of continuity 
that avoids any intuitive appeals to “unbroken curves” or functions without 
“jumps” or “holes.” 


Definition 4.3.1 (Continuity). A function f : A R is continuous at a 
point c G A if, for all e > 0, there exists a S > 0 such that whenever \x — c\ < S 
(and x £ A) it follows that \f(x) — /(c) | < e. 

If / is continuous at every point in the domain A, then we say that / is 
continuous on A. 


The definition of continuity looks much like the definition for functional 
limits, with a few subtle differences. The most important is that we require the 
point c to be in the domain of /. The value /(c) then becomes the value of 
lim x -> c f(x). With this observation in mind, it is tempting to shorten Defini- 
tion 4.3.1 to say that / is continuous at c E A if 

lim f(x) = /(c). 

X^tC 

This is fine as long as c is a limit point of A. If c is an isolated point of A, 
then lim x -> c f(x) isn’t defined but Definition 4.3.1 can still be applied. An un- 
remarkable but noteworthy consequence of this definition is that functions are 
continuous at isolated points of their domains (Exercise 4.3.5). 

We saw in the previous section that, in addition to the standard e-5 definition, 
functional limits have a useful formulation in terms of sequences. The same is 
true of continuity. The next theorem summarizes these various equivalent ways 
to characterize the continuity of a function at a given point. 
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Theorem 4.3.2 (Characterizations of Continuity). Let f : A R, and let 

c G A. The function f is continuous at c if and only if any one of the following 
three conditions is met: 


(i) For all e > 0, there exists a 6 > 0 such that 
\f(x) - f(c) I < e; 


x — c 


< 6 ( and x G A) implies 


(ii) For all V e (f(c)), there exists a V$(c ) with the property that x G Vs(c) ( and 
x G A) implies f(x) G V e (f(c)); 

(iii) If (x n ) -> c ( with x n G A), then f(x n ) /(c). 


If c is a limit point of A , then the above conditions are equivalent to 


(iv) lim f(x) = /(c) 

— YC 


Proof. Statement (i) is just Definition 4.3.1, and statement (ii) is the standard 
rewording of (i) using topological neighborhoods in place of the absolute value 
notation. Statement (iii) is equivalent to (i) via an argument nearly identical to 
that of Theorem 4.2.3, with some slight modifications for when x n = c. Finally, 
statement (iv) is seen to be equivalent to (i) by considering Definition 4.2.1 and 
observing that the case x = c (which is excluded in the definition of functional 
limits) leads to the requirement /(c) G V^(/(c)), which is trivially true. □ 


The length of this list is somewhat deceiving. Statements (i), (ii), and (iv) 
are closely related and essentially remind us that functional limits have an e-S 
formulation as well as a topological description. Statement (iii), however, is 
qualitatively different from the others. As a general rule, the sequential char- 
acterization of continuity is typically the most useful for demonstrating that a 
function is not continuous at some point. 

Corollary 4.3.3 (Criterion for Discontinuity). Let f : A R, and let 

c G A be a limit point of A. If there exists a sequence (x n ) C A where (x n ) —> c 
but such that f(x n ) does not converge to /(c), we may conclude that f is not 
continuous at c. 


The sequential characterization of continuity is also important for the other 
reasons that it was important for functional limits. In particular, it allows 
us to bring our catalog of results about the behavior of sequences to bear on 
the study of continuous functions. The next theorem should be compared to 
Corollary 4.2.3 as well as to Theorem 2.3.3. 

Theorem 4.3.4 (Algebraic Continuity Theorem). Assume f : A R and 

g : A -G R are continuous at a point c G A. Then , 

(i) kf(x ) is continuous at c for all k G R; 

(ii) f(x) + g(x) is continuous at c ; 

(iii) f(x)g{x) is continuous at c; and 

(iv) f(x)/g(x) is continuous at c, provided the quotient is defined. 
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Figure 4.6: The function xsin(l/x) near zero. 


Proof. All of these statements can be quickly derived from Corollary 4.2.4 and 
Theorem 4.3.2. □ 

These results provide us with the tools we need to firm up our arguments 
in the opening section of this chapter about the behavior of Dirichlet’s function 
and Thomae’s function. The details are requested in Exercise 4.3.7. Here are 
some more examples of arguments for and against continuity of some familiar 
functions. 

Example 4.3.5. All polynomials are continuous on R. In fact, rational func- 
tions (i.e., quotients of polynomials) are continuous wherever they are defined. 

To see why this is so, consider the identity function g{x) = x. Because 
| g(x) — g(c ) | = \x — c|, we can respond to a given e > 0 by choosing S = e, 
and it follows that g is continuous on all of R. It is even simpler to show that 
a constant function f(x) = fc, is continuous. (Letting 5 = 1 regardless of the 
value of e does the trick.) Because an arbitrary polynomial 


p(x) = ao + a\x + CL 2 X 2 + • • • + a n x n 

consists of sums and products of g{x) with different constant functions, we may 
conclude from Theorem 4.3.4 that p(x) is continuous. 

Likewise, Theorem 4.3.4 implies that quotients of polynomials are continuous 
as long as the denominator is not zero. 

Example 4.3.6. In Example 4.2.6, we saw that the oscillations of sin(l/x) are 
so rapid near the origin that lmr^o sin(l/x) does not exist. Now, consider the 
function 

x sm(l/x) if x 7 ^ 0 

0 if x = 0 . 


g(x) = 


To investigate the continuity of g at c = 0 (Fig. 4.6), we can estimate 


g(x) — g( 0 )| = \xsin(l/x) — 0 | < 


x 
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Given e > 0, set S = e, so that whenever \x — 0 = 
| g{x) — g{ 0)| < e. Thus, g is continuous at the origin. 



< 5 it follows that 


Example 4.3.7. Throughout the exercises we have been using the greatest 
integer function h(x) = [[#]] which for each x G R returns the largest integer 
n G Z satisfying n < x. This familiar step function certainly has discontinuous 
“jumps” at each integer value of its domain, but it is a useful exercise to try 
and articulate this observation in the language of analysis. 

Given m G Z, define the sequence (x n ) by x n = m — l/n. It follows that 
(x n ) —> m, but 

h{x n ) -> (to - 1), 

which does not equal m = h(m). By Corollary 4.3.3, we see that h fails to be 
continuous at each m G Z. 

Now let’s see why h is continuous at a point c £ Z. Given e > 0, we must find 
a (^-neighborhood Vs(c) such that x G Vs(c) implies h(x) G V e (h(c)). We know 
that c G R falls between consecutive integers n < c < n + 1 for some n G Z. 
If we take S = min{c — n, (n + 1) — c}, then it follows from the definition of h 
that h(x) = h(c ) for all x G V$(c). Thus, we certainly have 


h{x) G V e (h(c)) 


whenever x G V$(c). 

This latter proof is quite different from the typical situation in that the value 
of S does not actually depend on the choice of e. Usually, a smaller e requires a 
smaller S in response, but here the same value of S works no matter how small 
e is chosen. 


Example 4.3.8. Consider f(pc) = yCc defined on A = {x G R : x > 0}. 
Exercise 2.3.1 outlines a sequential proof that / is continuous on A. Here, we 
give an e-S proof of the same fact. 

Let e > 0. We need to argue that | f{pc) — /(c) | can be made less than e for 
all values of x in some 8 neighborhood around c. If c — 0, this reduces to the 
statement ^fx < e, which happens as long as x < e 2 . Thus, if we choose 8 


— e 2 


we see that \x — 0| <8 implies | f(x) — 0| < e. 

For a point c G A different from zero, we need to estimate 
time, write 


This 


x — vc 


x — \ c 


y/x + \fc 
X + \fc 


x — c 


X + \[c 


< 


x — c 


c 


In order to make this quantity less than e, it suffices to pick 8 = eyT. Then, 


x — c 


< 8 implies 


x — \ c 


< 


€\/C 


C 


as desired. 
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Although we have now shown that both polynomials and the square root 
function are continuous, the Algebraic Continuity Theorem does not provide 
the justification needed to conclude that a function such as h(x) = a/3 x 2 + 5 is 
continuous. For this, we must prove that compositions of continuous functions 
are continuous. 

Theorem 4.3.9 (Composition of Continuous Functions). Given f : R 

and g : B ^ R, assume that the range f(A) = {f(x) : x E A} is contained in 
the domain B so that the composition g o f(pc) = g(f(x)) is defined on A. 

If f is continuous at c E A, and if g is continuous at /(c) E B , then g o / is 
continuous at c. 

Proof. Exercise 4.3.3. □ 


Exercises 


Exercise 4.3.1. Let g(x) = yfx. 

(a) Prove that g is continuous at c = 0. 

(b) Prove that g is continuous at a point c / 0. (The identity a 3 — b 3 = 
(< a — b)(a 2 + ab -f b 2 ) will be helpful.) 

Exercise 4.3.2. To gain a deeper understanding of the relationship between 
e and 5 in the definition of continuity, let’s explore some modest variations of 
Definition 4.3.1. In all of these, let / be a function defined on all of R. 


(a) Let’s say / is onetinuous at c if for all e > 0 we can choose S = 1 and it 


follows that | f(x) — /(c) | < e whenever 
function that is onetinuous on all of R. 


x — c 


< S. Find an example of a 


(b) Let’s say / is equaltinuous at c if for all e > 0 we can choose S = e and it 
follows that | f(x) — /(c) | < e whenever \x — c\ < S. Find an example of a 
function that is equaltinuous on R that is nowhere onetinuous, or explain 
why there is no such function. 

(c) Let’s say / is lesstinuous at c if for all e > 0 we can choose 0 < <5 < e and 
it follows that | f(x) — /(c) | < e whenever \x — c\ < S. Find an example of a 
function that is lesstinuous on R that is nowhere equaltinuous, or explain 
why there is no such function. 

(d) Is every lesstinuous function continuous? Is every continuous function 
lesstinuous? Explain. 


Exercise 4.3.3. (a) Supply a proof for Theorem 4.3.9 using the e-5 charac- 

terization of continuity. 


(b) Give another proof of this theorem using the sequential characterization 
of continuity (from Theorem 4.3.2 (iii)). 


4.3. Continuous Functions 


127 


Exercise 4.3.4. Assume / and g are defined on all of R and that lim f(x) = q 

x^p 

and lim g(x) = r. 

x^-q 


(a) Give an example to show that it may not be true that 


lim g(f(x)) = r. 

x^-p 


(b) Show that the result in (a) does follow if we assume / and g are continuous. 

(c) Does the result in (a) hold if we only assume / is continuous? How about 
if we only assume that g is continuous? 

Exercise 4.3.5. Show using Definition 4.3.1 that if c is an isolated point of 
4CR, then f : A R is continuous at c. 


Exercise 4.3.6. Provide an example of each or explain why the request is 
impossible. 


(a) Two functions / and g, neither of which is continuous at 0 but such that 
f(x)g(x ) and f(x) + g{x) are continuous at 0. 

(b) A function f(x) continuous at 0 and g{x) not continuous at 0 such that 
f(x) + g(x) is continuous at 0. 


(c) A function f(x) continuous at 0 and g(x) not continuous at 0 such that 
f{x)g{x ) is continuous at 0. 

(d) A function f(x) not continuous at 0 such that f(x) + is continuous 
at 0. 



A function 


f{x) not continuous at 0 such that [f(x) 


3 


is continuous at 0. 


Exercise 4.3.7. (a) Referring to the proper theorems, give a formal argu- 

ment that Dirichlet’s function from Section 4.1 is nowhere- continuous 
on R. 


(b) Review the definition of Thomae’s function in Section 4.1 and demonstrate 
that it fails to be continuous at every rational point. 


(c) Use the characterization of continuity in Theorem 4.3.2 (iii) to show that 
Thomae’s function is continuous at every irrational point in R. (Given 
e > 0, consider the set of points {x E R : t(x) > e}.) 


Exercise 4.3.8. Decide if the following claims are true or false, providing either 
a short proof or counterexample to justify each conclusion. Assume throughout 
that g is defined and continuous on all of R. 


(a) If g(x) > 0 for all x < 1, then g( 1) > 0 as well. 

(b) If g(r) = 0 for all r G Q, then g(x) = 0 for all x G R. 
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(c) If g(xo) > 0 for a single point xq G R, then g(x) is in fact strictly positive 
for uncountably many points. 

Exercise 4.3.9. Assume h : R — > R is continuous on R and let K = {x : 
h(x) = 0}. Show that if is a closed set. 

Exercise 4.3.10. Observe that if a and b are real numbers, then 


max{a, b} 


1 

2 


(a + b) + 


a 



(a) Show that if /i, / 2 , • • • , fn are continuous functions, then 


g(x) = max{/i (a;), / 2 (x),..., /„(£)} 


is a continuous function. 


(b) Let’s explore whether the result in (a) extends to the infinite case. For 
each n G N, define f n on R by 


fn(x) 


{ 1 


if 

X 

| n 

X 

if 

X 


> 1/n 
< 1/n. 


Now explicitly compute h{x) = sup{/i(x), /^(x), fs(x ), . . .}. 

Exercise 4.3.11 (Contraction Mapping Theorem). Let / be a function 
defined on all of R, and assume there is a constant c such that 0 < c < 1 and 


fix) - f(y ) I < c 


X 


y 


for all x, y G R. 

(a) Show that / is continuous on R. 

(b) Pick some point yi G R and construct the sequence 

(j/i> /(j/i)> /(/(2/i))> • • •)• 


In general, if £/ n +i = f(Vn )? show that the resulting sequence (y n ) is a 
Cauchy sequence. Hence we may let y = lim y n . 

(c) Prove that y is a fixed point of / (i.e. , f(y) = y) and that it is unique in 
this regard. 


(d) Finally, prove that if x is any arbitrary point in R, then the sequence 
(x, /(#), /(/(#)), . . .) converges to y defined in (b). 

Exercise 4.3.12. Let F C R be a nonempty closed set and define g(x) = 


inf{|x — a 
all x ^ F. 


: a G F 1 }. Show that g is continuous on all of R and g(x) ^ 0 for 


Exercise 4.3.13. Let / be a function defined on all of R that satisfies the 
additive condition f(xFy) = f(x) + f(y) for all x, y G R. 
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(a) Show that /( 0) = 0 and that f(—x) = —f(x) for all x G R. 

(b) Let k = /( 1). Show that f(n) = kn for all n G N, and then prove that 
f(z ) = fcz for all z G Z. Now, prove that /(r) = kr for any rational 
number r. 

(c) Show that if / is continuous at x = 0, then / is continuous at every point 
in R and conclude that f(x) = kx for all x G R. Thus, any additive 
function that is continuous at x = 0 must necessarily be a linear function 
through the origin. 

Exercise 4.3.14. (a) Let F be a closed set. Construct a function / : R R 

such that the set of points where / fails to be continuous is precisely F. 
(The concept of the interior of a set, discussed in Exercise 3.2.14, may be 
useful.) 

(b) Now consider an open set O. Construct a function g : R R whose set 
of discontinuous points is precisely O. (For this problem, the function in 
Exercise 4.3.12 may be useful.) 

4.4 Continuous Functions on Compact Sets 

Given a function / : A —> R and a subset B C A, the notation f(B) refers to 
the range of / over the set B; that is, 

f( B ) = if(x) ■ x <G B}. 

The adjectives open, closed, bounded, compact, perfect, and connected are 
all used to describe subsets of the real line. An interesting question is to sort 
out which, if any, of these properties are preserved when a particular set B is 
mapped to f(B) via a continuous function. For instance, if B is open and / 
is continuous, is f(B) necessarily open? The answer to this question is no. If 
f(x) = x 2 and B is the open interval (—1,1), then f(B) is the interval [0, 1), 
which is not open. 

The corresponding conjecture for closed sets also turns out to be false, al- 
though constructing a counterexample requires a little more thought. Consider 
the function 

x 1 

X) = 

1 + X 2 

and the closed set B = [0, oo) = {x : x > 0}. Because g(B) = (0,1] is not 
closed, we must conclude that continuous functions do not, in general, map 
closed sets to closed sets. Notice, however, that our particular counterexample 
required using an unbounded closed set B. This is not incidental. Sets that are 
closed and bounded — that is, compact sets — always get mapped to closed and 
bounded subsets by continuous functions. 

Theorem 4.4.1 (Preservation of Compact Sets). Let f : A R be con- 
tinuous on A. If K C A is compact, then f{K ) is compact as well. 
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Proof. Let (y n ) be an arbitrary sequence contained in the range set /(if). 
To prove this result, we must find a subsequence (y nfe ), which converges to 
a limit also in /(if). The strategy is to take advantage of the assumption that 
the domain set if is compact by translating the sequence (y n ) — which is in the 
range of / — back to a sequence in the domain if. 

To assert that (y n ) C /(if) means that, for each n £ N, we can find (at least 
one) x n £ if with f(x n ) = y n . This yields a sequence (x n ) C if. Because if is 
compact, there exists a convergent subsequence (x nk ) whose limit x = \rmx nk 
is also in if. Finally, we make use of the fact that / is assumed to be continuous 
on A and so is continuous at x in particular. Given that (x nk ) -£ x, we conclude 
that (y nk ) f( x )• Because x £ FT, we have that f(x) £ /(if), and hence /(if ) 
is compact. □ 

An extremely important corollary is obtained by combining this result with 
the observation that compact sets are bounded and contain their supremums 
and inhmums. 

Theorem 4.4.2 (Extreme Value Theorem). Iff:K^ R is continuous on 
a compact set if C R, then f attains a maximum and minimum value. In other 
words , there exist xq,x\ £ if such that f(x o) < f(x) < f(x i) for all x £ if. 

Proof. Because /(if) is compact, we can set a = sup /(if) and know a £ f(K) 
(Exercise 3.3.1). It follows that there exist x\ £ K with a = f(x\). The 
argument for the minimum value is similar. □ 


Uniform Continuity 

Although we have proved that polynomials are always continuous on R, there 
is an important lesson to be learned by constructing direct proofs that the 
functions f(x) = 3x + 1 and g(x) = x 2 (previously studied in Example 4.2.2) 
are everywhere continuous. 

Example 4.4.3. (i) To show directly that f(x) = 3x + 1 is continuous at 

an arbitrary point c £ R, we must argue that \f(x) — /(c) | can be made 
arbitrarily small for values of x near c. Now, 


f(x) - /(c) I = |(3x + 1) - (3c + 1)1 = 3 


x — c 


so, given e > 0, we choose S = e/3. Then, 


x — c 


< S implies 


I/O) - /(c) | = 3 


x — c 


< 3 ( - ) = 6. 


Of particular importance for this discussion is the fact that the choice of 
S is the same regardless of which point c £ R we are considering. 

(ii) Let’s contrast this with what happens when we prove g(x) = x 2 is contin- 
uous on R. Given c £ R, we have 


\g(x) - g(c) 


x‘ 


c 


x — c 


x A c 
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As discussed in Example 4.2.2, we need an upper bound on \x + c|, which 
is obtained by insisting that our choice of 5 not exceed 1. This guarantees 
that all values of x under consideration will necessarily fall in the interval 
(c — 1, c + 1). It follows that 


x + c\ < \x\ + \c\ < (|c| + 1) + \c\ = 2|c| + 1 


Now, let e > 0. If we choose <5 = min{l, e/(2|c| + 1)}, then 
implies 


x — c 


< S 


I/O) - /O) i = 


x — c 


X + c < 


2c -hi 


(2|c| + 1) = e. 


Now, there is nothing deficient about this argument, but it is important 
to notice that, in the second proof, the algorithm for choosing the response S 
depends on the value of c. The statement 



c 


+ 1 


means that larger values of c are going to require smaller values of 4, a fact 
that should be evident from a consideration of the graph of g(x) = x 2 (Fig. 4.7). 
Given, say, e = 1, a response of 5 = 1/3 is sufficient for c = 1 because 2/3 < 
x < 4/3 certainly implies 0 < x 2 < 2. However, if c = 10, then the steepness 
of the graph of g(x) means that a much smaller S is required — S = 1/21 by our 
rule — to force 99 < x 2 < 101. 

The next definition is meant to distinguish between these two examples. 



Figure 4.7: g(x) = x 2 \ A larger c requires a smaller S. 
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Definition 4.4.4 (Uniform Continuity). A function f : A R is uniformly 
continuous on A if for every e > 0 there exists a S > 0 such that for all x, y E A, 


x 


y\ < S implies \f(x) — f(y)\ < e. 


Recall that to say that u f is continuous on A” means that / is continuous at 
each individual point c E A. In other words, given e > 0 and c E A, we can find 
a S > 0 perhaps depending on c such that if \x — c\ < 6, then \f(x) - /(c) | < e. 
Uniform continuity is a strictly stronger property. The key distinction between 
asserting that / is “uniformly continuous on A” versus simply “continuous on A” 
is that, given an e > o, a single S > 0 can be chosen that works simultaneously 
for all points c in A. To say that a function is not uniformly continuous on a set 
A, then, does not necessarily mean it is not continuous at some point. Rather, it 
means that there is some eo > 0 for which no single S > 0 is a suitable response 
for all c E A. 


Theorem 4.4.5 (Sequential Criterion for Absence of Uniform Conti- 
nuity). A function f : A R fails to be uniformly continuous on A if and 
only if there exists a particular eo > 0 and two sequences (x n ) and (y n ) in A 
satisfying 


x 


n 


Vn 


0 but \f(x n ) - f(y n )\>e 0 . 


Proof. The negation of Definition 4.4.4 states that / is not uniformly continuous 
on A if and only if there exists eo > 0 such that for all S > 0 we can find two 
points x and y satisfying \x — y\ < S but with | f(x) — f(y)\ > 6q. Thus, if 
we set 5i = 1, then there exist two points x\ and y\ where \x\ — y\\ < 1 but 
I f(xi) - f(yi)\ > e 0 . 

In a similar way, if we set 5 n = 1/n where n E N, it follows that there 
exist points x n and y n with \x n — y n \ < 1/n but where | f(x n ) — f(y n ) | A eo- 
The resulting sequences (x n ) and (y n ) satisfy the requirements described in the 
theorem. 

Conversely, if eo, (x n ) and (y n ) exist as described, it is straightforward to 
see that no S > 0 is a suitable response for cq. □ 

Example 4.4.6. The function h(x) = sin(l/x) (Fig. 4.5) is continuous at every 
point in the open interval (0, 1) but is not uniformly continuous on this interval. 
The problem arises near zero, where the increasingly rapid oscillations take 
domain values that are quite close together to range values a distance 2 apart. 
To illustrate Theorem 4.4.5, take eo = 2 and set 


1 


Xn — 


7r/2 T 2mr 


and 


1 


Vn — 


37r/2 + 2nn 


Because each of these sequences tends to zero, we have 
short calculation reveals | h(x n ) — h(y n ) \ = 2 for all n E N 


x n ~ Vn\ 0> an d a 


Whereas continuity is defined at a single point, uniform continuity is always 
discussed in reference to a particular domain. In Example 4.4.3, we were not 
able to prove that g(x) = x 2 is uniformly continuous on R because larger 
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values of x require smaller and smaller values of S. (As another illustration 
of Theorem 4.4.5, take x n = n and y n = n + 1/n.) It is true, however, that 
g(x) is uniformly continuous on the bounded set [—10,10]. Returning to the 
argument set forth in Example 4.4.3 (ii), notice that if we restrict our attention 
to the domain [—10, 10], then \x + y\ <20 for all x and y. Given e > 0, we can 
now choose S = e/20, and verify that if x, y G [—10, 10] satisfy \x — y\ < 4, then 


f{x) - f(y) 



x — y\ \x + y < 



20 = e. 


In fact, it is not difficult to see how to modify this argument to show that g(x) 
is uniformly continuous on any bounded set A in R. 

Now, Example 4.4.6 is included to keep us from jumping to the erroneous 
conclusion that functions that are continuous on bounded domains are neces- 
sarily uniformly continuous. A general result does follow, however, if we assume 
that the domain is compact. 


Theorem 4.4.7 (Uniform Continuity on Compact Sets). A function that 
is continuous on a compact set K is uniformly continuous on K . 


Proof. Assume f : K R is continuous at every point of a compact set K C R. 
To prove that / is uniformly continuous on K we argue by contradiction. 

By the criterion in Theorem 4.4.5, if / is not uniformly continuous on 44, 
then there exist two sequences (x n ) and (y n ) in K such that 


lim 


x 


n 


-yn 1=0 while \f(x n )~ f(y n )\>e 0 


for some particular eo > 0. Because K is compact, the sequence (x n ) has a 
convergent subsequence (x Uk ) with x = limx nk also in K. 

We could use the compactness of K again to produce a convergent subse- 
quence of (yn), but notice what happens when we consider the particular sub- 
sequence (y nk ) consisting of those terms in (y n ) that correspond to the terms 
in the convergent subsequence (x nk ). By the Algebraic Limit Theorem, 

lim (Vn k ) = lim ((y nk - x nk ) + x„ k ) = 0 + x. 

The conclusion is that both (x nk ) and (y nk ) converge to x G K. Because / is 
assumed to be continuous at x, we have lim f(x nk ) = f(x) and lim f(y Uk ) = 
f(x), which implies 

lim (f(x„ k ) - f(y„ k )) = 0. 

A contradiction arises when we recall that (x n ) and (y n ) were chosen to satisfy 


f(x„) - f(y n ) I > £0 


for all n G N. We conclude, then, that / is indeed uniformly continuous on K. 

□ 
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Exercises 


Exercise 4.4.1. (a) Show that f(x) = x 3 is continuous on all of R. 

(b) Argue, using Theorem 4.4.5, that / is not uniformly continuous on R. 

(c) Show that / is uniformly continuous on any bounded subset of R. 

Exercise 4.4.2. (a) Is f(x) = 1/x uniformly continuous on (0, 1)? 

(b) Is g(x) = Vx 2 + 1 uniformly continuous on (0, 1)? 

(c) Is h(x) = xsm(l/x) uniformly continuous on (0, 1)? 

Exercise 4.4.3. Show that f(x) = 1/x 2 is uniformly continuous on the set 
[1, oo ) but not on the set (0, 1]. 

Exercise 4.4.4. Decide whether each of the following statements is true or 
false, justifying each conclusion. 

(a) If / is continuous on [a, b] with f{x) > 0 for all a < x < 6, then 1// is 
bounded on [a, b] (meaning 1// has bounded range). 

(b) If / is uniformly continuous on a bounded set A, then f(A) is bounded. 

(c) If / is defined on R and f(K ) is compact whenever K is compact, then / 
is continuous on R. 

Exercise 4.4.5. Assume that g is defined on an open interval (a, c) and it is 
known to be uniformly continuous on (a, b] and [6, c), where a < b < c. Prove 
that g is uniformly continuous on (a, c). 

Exercise 4.4.6. Give an example of each of the following, or state that such a 
request is impossible. For any that are impossible, supply a short explanation 
for why this is the case. 


(a) A continuous function / : (0, 1) R and a Cauchy sequence (x n ) such 
that f(x n ) is not a Cauchy sequence; 

(b) A uniformly continuous function / : (0, 1) —> R and a Cauchy sequence 
(x n ) such that f(x n ) is not a Cauchy sequence; 


(c) A continuous function / : [0, oo) R and a Cauchy sequence (x n ) such 
that f(x n ) is not a Cauchy sequence; 

Exercise 4.4.7. Prove that f(x) = -Jx is uniformly continuous on [0, oo). 

Exercise 4.4.8. Give an example of each of the following, or provide a short 
argument for why the request is impossible. 


(a) A continuous function defined on [0, 1] with range (0, 1). 

(b) A continuous function defined on (0, 1) with range [0, 1]. 
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(c) A continuous function defined on (0, 1] with range (0, 1). 

Exercise 4.4.9 (Lipschitz Functions). A function f : A R is called 
Lipschitz if there exists a bound M > 0 such that 


fjx) - fjy ) 

x-y 


< M 


for all x 7^ y E A. Geometrically speaking, a function / is Lipschitz if there is a 
uniform bound on the magnitude of the slopes of lines drawn through any two 
points on the graph of /. 

(a) Show that if / : A R is Lipschitz, then it is uniformly continuous on A. 

(b) Is the converse statement true? Are all uniformly continuous functions 
necessarily Lipschitz? 

Exercise 4.4.10. Assume that / and g are uniformly continuous functions 
defined on a common domain A. Which of the following combinations are 
necessarily uniformly continuous on A: 

f(x)+g(x), f(x)g(x), f(g(x))7 

g{x) 

(Assume that the quotient and the composition are properly defined and thus 
at least continuous.) 

Exercise 4.4.11 (Topological Characterization of Continuity). Let g be 

defined on all of R. If B is a subset of R, define the set g _1 (L>) by 

g~ 1 {B) = {x G R : g(x) E B}. 

Show that g is continuous if and only if g~ l {0) is open whenever O C R is an 
open set. 


Exercise 4.4.12. Review Exercise 4.4.11, and then determine which of the 
following statements is true about a continuous function defined on R: 

(a) / -1 (L>) is finite whenever B is finite. 

(b) f~ l {K) is compact whenever K is compact. 

(c) / -1 (A) is bounded whenever A is bounded. 

(d) f~ 1 (F) is closed whenever F is closed. 

Exercise 4.4.13 (Continuous Extension Theorem). (a) Show that a 
uniformly continuous function preserves Cauchy sequences; that is, if 
f : A R is uniformly continuous and (x n ) C A is a Cauchy sequence, 
then show f{x n ) is a Cauchy sequence. 
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(b) Let g be a continuous function on the open interval (a, b). Prove that 
g is uniformly continuous on (a, b ) if and only if it is possible to define 
values g(a) and g(b) at the endpoints so that the extended function g is 
continuous on [a, b\. (In the forward direction, first produce candidates 
for g(a) and g(b), and then show the extended g is continuous.) 

Exercise 4.4.14. Construct an alternate proof of Theorem 4.4.7 using the 
open cover characterization of compactness from the Heine-Borel Theorem 
(Theorem 3.3.8 (hi)). 


4.5 The Intermediate Value Theorem 


The Intermediate Value Theorem (IVT) is the name given to the very intuitive 
observation that a continuous function / on a closed interval [a, b] attains every 
value that falls between the range values /(a) and f{b) (Fig. 4.8). 

Here is this observation in the language of analysis. 


Theorem 4.5.1 (Intermediate Value Theorem). Let f : [a, b] R be 

continuous. If L is a real number satisfying /(a) < L < f(b) or f(a) > L > 
f(b), then there exists a point c E (a, b) where /(c) = L. 


This theorem was freely used by mathematicians of the 18th century (includ- 
ing Euler and Gauss) without any consideration of its validity. In fact, the first 
analytical proof was not offered until 1817 by Bolzano in a paper that also con- 
tains the first appearance of a somewhat modern definition of continuity. This 
emphasizes the significance of this result. As discussed in Section 4.1, Bolzano 
and his contemporaries had arrived at a point in the evolution of mathematics 
where it was becoming increasingly important to firm up the foundations of the 
subject. Doing so, however, was not simply a matter of going back and sup- 
plying the missing proofs. The real battle lay in first obtaining a thorough and 
mutually agreed-upon understanding of the relevant concepts. The importance 
of the Intermediate Value Theorem for us is similar in that our understanding 
of continuity and the nature of the real line is now mature enough for a proof to 
be possible. Indeed, there are several satisfying arguments for this simple result, 
each one isolating, in a slightly different way, the interplay between continuity 
and completeness. 


Preservation of Connected Sets 

The most potentially useful way to understand the Intermediate Value Theorem 
(IVT) is as a special case of the fact that continuous functions map connected 
sets to connected sets. In Theorem 4.4.1, we saw that if / is a continuousfunction 
on a compact set FT, then the range set f(K) is also compact. The analogous 
observation holds for connected sets. 

Theorem 4.5.2 (Preservation of Connected Sets). Let f : G R be 

continuous. IfE C G is connected, then f(E ) is connected as well. 
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Proof. Intending to use the characterization of connected sets in Theorem 3.4.6, 
let f(E) = A U B where A and B are disjoint and nonempty. Our goal is to 
produce a sequence contained in one of these sets that converges to a limit in 
the other. 

Let 


C = {x G E : f(x) G A} and D = {x G E : f(x) G B}. 

The sets C and D are called the preimages of A and B , respectively. Using the 
properties of A and it is straightforward to check that C and D are nonempty 
and disjoint and satisfy E = C U D. Now, we are assuming E is a connected 
set, so by Theorem 3.4.6, there exists a sequence (x n ) contained in one of C or 
D with x = limx n contained in the other. Finally, because / is continuous at x, 
we get f(x) = lim f(x n ). Thus, it follows that f(x n ) is a convergent sequence 
contained in either A or B while the limit f(x) is an element of the other. With 
another nod to Theorem 3.4.6, the proof is complete. □ 

In R, a set is connected if and only if it is a (possibly unbounded) interval. 
This fact, together with Theorem 4.5.2, leads to a short proof of the Interme- 
diate Value Theorem (Exercise 4.5.1). We should point out that the proof of 
Theorem 4.5.2 does not make use of the equivalence between connected sets and 
intervals in R but relies only on the general definitions. The previous comment 
that this is the most useful way to approach IVT stems from the fact that, 
although it is not discussed here, the definitions of continuity and connected- 
ness can be easily adapted to higher-dimensional settings. Theorem 4.5.2, then, 
remains a valid conclusion in higher dimensions, whereas the Intermediate Value 
Theorem is essentially a one-dimensional result. 
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Completeness 

A typical way the Intermediate Value Theorem is applied is to prove the exis- 
tence of roots. Given f(pc) = x 2 — 2, for instance, we see that /( 1) = — 1 and 
/( 2) = 2. Therefore, there exists a point c E (1, 2) where /(c) = 0. 

In this case, we can easily compute c = y/2, meaning that we really did not 
need IVT to show that / has a root. We spent a good deal of time in Chapter 1 
proving that y/2 exists, which was only possible once we insisted on the Axiom of 
Completeness as part of our assumptions about the real numbers. The fact that 
the Intermediate Value Theorem has just asserted that y/2 exists suggests that 
another way to understand this result is in terms of the relationship between 
the continuity of / and the completeness of R. 

The Axiom of Completeness (AoC) from the first chapter states that 
“Nonempty sets that are bounded above have least upper bounds.” Later, we 
saw that the Nested Interval Property (NIP) is an equivalent way to assert that 
the real numbers have no “gaps.” Either of these characterizations of complete- 
ness can be used as the cornerstone for an alternate proof of Theorem 4.5.1. 


Proof. I. ( First approach using AoC.) To simplify matters a bit, let’s consider 
the special case where / is a continuous function satisfying /(a) < 0 < f{b) and 
show that /(c) = 0 for some c E (a, b). First let 


K = {x E [a, b] : f(x) < 0}, 



Notice that K is bounded above by 6, and a E K so K is not empty. Thus we 
may appeal to the Axiom of Completeness to assert that c = sup K exists. 
There are three cases to consider: 

/(c) > 0, /(c) < 0, and /(c) = 0. 

The fact that c is the least upper bound of K can be used to rule out the first 
two cases, resulting in the desired conclusion that /(c) = 0. The details are 
requested in Exercise 4.5.5(a). 

II. ( Second approach using NIP.) Again, consider the special case where 
L = 0 and f(a) < 0 < f(b). Let To = [<T &], and consider the midpoint 


z = (a + b) / 2. 
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If f(z) > 0, then set a\ = a and b\ = z. If f(z ) < 0, then set a\ = z and b\ = b. 
In either case, the interval I\ = [ai,bi] has the property that / is negative at 
the left endpoint and nonnegative at the right. 



l 1 h 

This procedure can be inductively repeated, setting the stage for an applica- 
tion of the Nested Interval Property. The remainder of the argument is left as 
Exercise 4.5.5(b). □ 

The Intermediate Value Property 

Does the Intermediate Value Theorem have a converse? 

Definition 4.5.3. A function / has the intermediate value property on an 
interval [a, b] if for all x < y in [a, b] and all L between f{pc) and /(?/), it is 
always possible to find a point c e (x,y) where /(c) = L. 

Another way to summarize the Intermediate Value Theorem is to say that 
every continuous function on [a, b] has the intermediate value property. There 
is an understandable temptation to suspect that any function that has the in- 
termediate value property must necessarily be continuous, but that is not the 
case. We have seen that 


n(T \ = j sin(l/x) ifx^O 
9y ’ \ 0 if x = 0 

is not continuous at zero (Example 4.2.6), but it does have the intermediate 
value property on [0, 1]. 

The intermediate value property does imply continuity if we insist that our 
function is monotone (Exercise 4.5.3). 

Exercises 

Exercise 4.5.1. Show how the Intermediate Value Theorem follows as a corol- 
lary to Theorem 4.5.2. 

Exercise 4.5.2. Provide an example of each of the following, or explain why 
the request is impossible 
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(a) A continuous function defined on an open interval with range equal to a 
closed interval. 

(b) A continuous function defined on a closed interval with range equal to an 
open interval. 

(c) A continuous function defined on an open interval with range equal to an 
unbounded closed set different from R. 


(d) A continuous function defined on all of R with range equal to Q. 


Exercise 4.5.3. A function / is increasing on A if f(x) < f(y) for all x < y 
in A. Show that if / is increasing on [a, b] and satisfies the intermediate value 
property (Definition 4.5.3), then / is continuous on [a, b\. 


Exercise 4.5.4. Let g be continuous on an interval A and let F be the set of 
points where g fails to be one-to-one; that is, 


F = {x E A : f(x) = f(y ) for some y ^ x and y E A}. 

Show F is either empty or uncountable. 

Exercise 4.5.5. (a) Finish the proof of the Intermediate Value Theorem 

using the Axiom of Completeness started previously. 

(b) Finish the proof of the Intermediate Value Theorem using the Nested 
Interval Property started previously. 


Exercise 4.5.6. Let / : [0, 1] R be continuous with /( 0) = /( 1). 


(a) Show that there must exist x, y E [0,1] satisfying | x — y 

f(x) = f(y )• 


1/2 and 


(b) Show that for each n E N there exist x n ,y n E [0, 1] with \x n — y n \ = 1 jn 
and f(x n ) = f(y n ). 

(c) If h E (0,1/2) is not of the form 1/n, there does not necessarily exist 
x — y | = h satisfying f(x) = f(y). Provide an example that illustrates 
this using h = 2/5. 

Exercise 4.5.7. Let / be a continuous function on the closed interval [0, 1] 
with range also contained in [0, 1]. Prove that / must have a fixed point; that 
is, show f(x) = x for at least one value of x E [0, 1]. 

Exercise 4.5.8 (Inverse functions). If a function f : A R is one-to-one, 
then we can define the inverse function / -1 on the range of / in the natural 
way: f~ x {y) = x where y = f(x). 

Show that if / is continuous on an interval [a, b } and one-to-one, then / -1 is 
also continuous. 
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4.6 Sets of Discontinuity 

Given a function / : R R, define Df C R to be the set of points where 
the function / fails to be continuous. In Section 4.1, we saw that Dirichlet’s 
function g(x) had D g = R. The modification h(x) of Dirichlet’s function had 
Dh = R\{0}, zero being the only point of continuity. Finally, for Thomae’s 
function t(x), we saw that D t = Q. 

Exercise 4.6.1. Using modifications of these functions, construct a function 
/: R^R so that 

(a) D f = Z c . 

(b) Df = {x : 0 < x < 1}. 

Exercise 4.6.2. Given a countable set A = { 01 , 02 , 03 , . . define f(a n ) = 1/n 
and f(x) = 0 for all x ^ A. Find Df. 

We concluded the introduction with a question about whether Df could take 
the form of any arbitrary subset of the real line. As it turns out, this is not 
the case. The set of discontinuities of a real-valued function on R has a specific 
topological structure that is not possessed by every subset of R. Specifically, 
Df, no matter how / is chosen, can always be written as the countable union 
of closed sets. In the case where / is monotone , these closed sets can be taken 
to be single points. 

Monotone Functions 

Classifying Df for an arbitrary / is somewhat involved, so it is interesting that 
describing Df is fairly straightforward for the class of monotone functions. 

Definition 4.6.1. A function f : A R is increasing on A if f(x) < f(y) 
whenever x < y and decreasing if f(x) > f(y ) whenever x < y in A. A 
monotone function is one that is either increasing or decreasing. 

Continuity of / at a point c means that lim^ c f{pc) = /(c). One particular 
way for a discontinuity to occur is if the limit from the right at c is different 
from the limit from the left at c. As always with new terminology, we need to 
be precise about what we mean by “from the left” and “from the right.” 

Definition 4.6.2. Given a limit point c of a set A and a function f : A R, 
we write 

lim f(x) = L 

cc— )>c+ 

if for all e > 0 there exists a S > 0 such that \f(x)—L\ < e whenever 0 < x—c < (5. 

Equivalently, in terms of sequences, lim x ^ c + f(x) = L if lim f(x n ) = L for 
all sequences (x n ) satisfying x n > c and lim(x n ) = c. 
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Exercise 4.6.3. State a similar definition for the left-hand limit 

lim f(pc) = L. 

x — yc~ 

Theorem 4.6.3. Given f : A R and a limit point c of A, lim . x -> c f(x) = L 
if and only if 

lim f(x) = L and lim f(x) = L. 

x^-c~ x^-c+ 

Exercise 4.6.4. Supply a proof for this proposition. 

Generally speaking, discontinuities can be divided into three categories: 

(i) If lim^ c f{x) exists but has a value different from /(c), the discontinuity 
at c is called removable. 

(ii) If lim x ^ c + f(x) 7^ \im x _^ c - /(#), then / has a jump discontinuity at c. 

(iii) If linr^c f(x) does not exist for some other reason, then the discontinuity 
at c is called an essential discontinuity. 

We are now equipped to characterize the set Df for an arbitrary monotone 
function /. 

Exercise 4.6.5. Prove that the only type of discontinuity a monotone function 
can have is a jump discontinuity. 

Exercise 4.6.6. Construct a bijection between the set of jump discontinuities 
of a monotone function / and a subset of Q. Conclude that Df for a monotone 
function / must either be finite or countable, but not uncountable. 

Df for an Arbitrary Function 

Recall that the intersection of an infinite collection of closed sets is closed, but 
for unions we must restrict ourselves to finite collections of closed sets in order 
to ensure the union is closed. For open sets the situation is reversed. The 
arbitrary union of open sets is open, but only finite intersections of open sets 
are necessarily open. 

Definition 4.6.4. A set that can be written as the countable union of closed 
sets is in the class F a . (This definition also appeared in Section 3.5.) 

In Section 4.1 we constructed functions where the set of discontinuity was R 
(Dirichlet’s function), R\{0} (modified Dirichlet function), and Q (Thomae’s 
function) . 

Exercise 4.6.7. (a) Show that in each of the above cases we get an F c r set 

as the set where the function is discontinuous. 

(b) Show that the two sets of discontinuity in Exercise 4.6.1 are F a sets. 
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The upcoming argument depends on a concept called ^-continuity. 

Definition 4.6.5. Let / be defined on R, and let a > 0. The function / is 
a-continuous at x G R if there exists a S > 0 such that for all y, z G (x — 5,x + 5) 
it follows that | f(y) — f(z)\ < a. 

The most important thing to note about this definition is that there is no 
“for all” in front of the a > 0. As we will investigate, adding this quantifier 
would make this definition equivalent to our definition of continuity. In a sense, 
(^-continuity is a measure of the variation of the function in the neighborhood 
of a particular point. A function is a-continuous at a point c if there is some 
interval centered at c in which the variation of the function never exceeds the 
value a > 0. 

Given a function / on R, define Df to be the set of points where the function 
/ fails to be a-continuous. In other words, 

Df = {x G R : / is not a-continuous at x}. 

Exercise 4.6.8. Prove that, for a fixed a > 0, the set Df is closed. 

The stage is set. It is time to characterize the set of discontinuity for an 
arbitrary function / on R. 

Theorem 4.6.6. Let f : R — > R be an arbitrary function. Then, Df is an F a 
set. 

Proof. Recall that 

Df = {x G R -f is not continuous at x}. 

Exercise 4.6.9. If a < a 7 , show that D°f C DJ. 

Exercise 4.6.10. Let a > 0 be given. Show that if / is continuous at x, then 
it is a-continuous at x as well. Explain how it follows that Df C Df. 

Exercise 4.6.11. Show that if / is not continuous at x, then / is not 
a-continuous for some a > 0. Now explain why this guarantees that 

oo 

Df= IJ D T' 

71=1 

where a n = 1/n. 

Because each Df n is closed, the proof is complete. □ 
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4.7 Epilogue 

Theorem 4.6.6 is only interesting if we can demonstrate that not every subset 
of R is in an F a set. This takes some effort and was included as an exercise in 
Section 3.5 on the Baire Category Theorem. Baire’s Theorem states that if R is 
written as the countable union of closed sets, then at least one of these sets must 
contain a nonempty open interval. Now Q is the countable union of singleton 
points, and we can view each point as a closed set that obviously contains no 
intervals. If the set of irrationals I were a countable union of closed sets, it would 
have to be that none of these closed sets contained any open intervals or else they 
would then contain some rational numbers. But this leads to a contradiction 
to Baire’s Theorem. Thus, I is not the countable union of closed sets, and 
consequently it is not an F a set. We may therefore conclude that there is no 
function / that is continuous at every rational point and discontinuous at every 
irrational point. This should be compared with Thomae’s function discussed 
earlier. 

The converse question is interesting as well. Given an arbitrary F a set, W.H. 
Young showed in 1903 that it is always possible to construct a function that has 
discontinuities precisely on this set. Exercise 4.3.14 gives some clues for how to 
do this in the much simpler case of an arbitrary closed set, and Exercise 4.6.2 
handles the case of an arbitrary countable set. Young’s construction involves 
some of these same techniques, as well as the Dirichlet-type definitions we have 
seen, but is understandably more intricate. By contrast, a function demonstrat- 
ing the converse for the monotone case is not too difficult to describe. Let 

D = {xi,x 2 ,x 3 ,x 4 , . . .} 

be an arbitrary countable set of real numbers. In order to construct a monotone 
function that has discontinuities precisely on D, we first consider a particular 
x n G D and define the step function 

„ ( x ) = f V 2 " for x>Xn 

" y 0 for x < x n . 

Observing that each u n (x) is monotone and everywhere continuous except for 
a single discontinuity at x n , we now set 

oo 

f(x) = Y,Un{x). 

n — 1 

The convergence of the series ^ l/2 n guarantees that our function / is defined 
on all of R, and intuition certainly suggests that / is monotone with jump 
discontinuities precisely on D. Providing a rigorous proof for this conclusion is 
one of the many pleasures that awaits in Chapter 6, where we take up the study 
of infinite series of functions. 


Chapter 5 

The Derivative 


5.1 Discussion: Are Derivatives Continuous? 


The geometric motivation for the derivative is most likely familiar territory. 
Given a function g(x), the derivative g'(x) is understood to be the slope of the 
graph of g at each point x in the domain. A graphical picture (Fig. 5.1) reveals 
the impetus behind the mathematical definition 


g'(c) = lim 


x— >c 


g(x) - g(c) 

X — C 


The difference quotient (g(x) — g(c))/(x — c ) represents the slope of the line 
through the two points (x, g(x)) and (c, g(c)). By taking the limit as x approaches 
c, we arrive at a well-defined mathematical meaning for the slope of the tangent 
line at x = c. 

The myriad applications of the derivative function are the topic of much 
of the calculus sequence, as well as several other upper-level courses in mathe- 
matics. None of these applied questions are pursued here in any length, but it 
should be pointed out that the rigorous underpinnings for differentiation worked 
out in this chapter are an essential foundation for any applied study. Eventu- 
ally, as the derivative is subjected to more and more complex manipulations, 
it becomes crucial to know precisely how differentiation is defined and how it 
interacts with other mathematical operations. 

Although physical applications are not explicitly discussed, we will encounter 
several questions of a more abstract quality as we develop the theory. Many of 
these are concerned with the relationship between differentiation and continuity. 
Are continuous functions always differentiable? If not, how nondifferentiable can 
a continuous function be? Are differentiable functions continuous? Given that 


@ Springer Science+Business Media New York 2015 
S. Abbott, Understanding Analysis , Undergraduate Texts 
in Mathematics, DOI 10.1007/978-l-4939-2712-8_5 


145 


146 


Chapter 5. The Derivative 



Figure 5.1: Definition of g'(c). 


a function / has a derivative at every point in its domain, what can we say 
about the function /'? Is f continuous? How accurately can we describe the 
set of all possible derivatives, or are there no restrictions? Put another way, if 
we are given an arbitrary function g, is it always possible to find a differentiable 
function / such that /' = g, or are there some properties that g must possess for 
this to occur? In our study of continuity, we saw that restricting our attention 
to monotone functions had a significant impact on the answers to questions 
about sets of discontinuity. What effect, if any, does this same restriction have 
on our questions about potential sets of nondifferentiable points? Some of these 
issues are harder to resolve than others, and some remain unanswered in any 
satisfactory way. 

A particularly useful class of examples for this discussion are functions of 
the form 


9n(x) 


x n sin(l/x) if x ^ 0 
0 if x = 0 . 


When n = 0, we have seen (Example 4.2.6) that the oscillations of sin(l/x) 
prevent go(x) from being continuous at x = 0. When n — 1, these oscillations 


are squeezed between \x\ and 


x 


the result being that g\ is continuous at 


x — 0 (Example 4.3.6). Is g'i(0) defined? Using the preceding definition, we get 

g[( 0 ) = lim ^ ^ = lim sin(lAc), 

x — U) X x — U) 


which, as we now know, does not exist. Thus, g\ is not differentiable at x — 0. 
On the other hand, the same calculation shows that g 2 is differentiable at zero. 
In fact, we have 



lim x sin(l/x) = 0 . 

cc— 


At points different from zero, we can use the familiar rules of differentiation 
(soon to be justified) to conclude that $2 is differentiable everywhere in R with 


9 2 co 


— cos(1/t) + 2x sin(l/x) if x 7 ^ 0 
0 if x = 0 . 
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Figure 5.2: The function # 2 (x) =x 2 sin(l/x) near zero. 



But now consider 


lim g' 2 (x). 

X^O 


Because the cos(l/x) term is not preceded by a factor of x, we must conclude 
that this limit does not exist and that, consequently, the derivative function 
is not continuous. To summarize, the function g 2 (x) is continuous and differ- 
entiable everywhere on R (Fig. 5.2), the derivative function g 2 is thus defined 
everywhere on R, but g 2 has a discontinuity at zero. The conclusion is that 
derivatives need not, in general, be continuous! 

The discontinuity in g 2 is essential , meaning linu^o d'( x ) does not exist as a 
one-sided limit. But, what about a function with a simple jump discontinuity? 
For example, does there exist a function h such that 



— 1 if x < 0 
1 if x > 0. 


A first impression may bring to mind the absolute value function, which has 
slopes of —1 at points to the left of zero and slopes of 1 to the right. However, the 
absolute value function is not differentiable at zero. We are seeking a function 
that is differentiable everywhere, including the point zero, where we are insisting 
that the slope of the graph be —1. The degree of difficulty of this request should 
start to become apparent. Without sacrificing differentiability at any point, we 
are demanding that the slopes jump from —1 to 1 and not attain any value in 
between. 

Although we have seen that continuity is not a required property of deriva- 
tives, the intermediate value property will prove a more stubborn quality to 
ignore. 
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5.2 Derivatives and the Intermediate 
Value Property 


Although the definition would technically make sense for more complicated 
domains, all of the interesting results about the relationship between a func- 
tion and its derivative require that the domain of the given function be an 
interval. Thinking geometrically of the derivative as a rate of change, it should 
not be too surprising that we would want to confine the independent variable 
to move about a connected domain. 

The theory of functional limits from Section 4.2 is all that is needed to supply 
a rigorous definition for the derivative. 

Definition 5.2.1 (Differentiability). Let g : A R be a function defined 
on an interval A. Given c E A, the derivative of g at c is defined by 


g'(c) = lim 

x^-c 


gjx) - g(c) 

X — C 


provided this limit exists. In this case we say g is differentiable at c. If g' exists 
for all points c E A, we say that g is differentiable on A. 

Example 5.2.2. (i) Consider f(x) = x n , where n E N, and let c be any 

arbitrary point in R. Using the algebraic identity 

x n - c n = (x- c){x n ~ l + cx n ~ 2 + c 2 x n ~ 3 + • ■ • + c”- 1 ), 


we can calculate the familiar formula 


f'(c) = lim 

X^tC 


X n — C 


n 


X — C 


lim ( x n 1 + cx n 2 + c 2 x n 3 + • • • + c n 1 ) 

x^-c 


c" -1 + c n_1 + 


+ C 


ri- 


nc 


n- 


(ii) If g(x) = 
the limit 


x 


then attempting to compute the derivative at c 



lim 

x — ^0 


X 

X 


0 produces 


which is +1 or —1 depending on whether x approaches zero from the right 
or left. Consequently, this limit does not exist, and we conclude that g is 
not differentiable at zero. 

Example 5.2.2 (ii) is a reminder that continuity of g does not imply that g 
is necessarily differentiable. On the other hand, if g is differentiable at a point, 
then it is true that g must be continuous at this point. 

Theorem 5.2.3. If g : A R is differentiable at a point c E A, then g is 
continuous at c as well. 
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Proof. We are assuming that 



lim 


g(x) - g(c) 


x^c X — C 


exists, and we want to prove that lim X ^ c g(x) = g(c). But notice that the 
Algebraic Limit Theorem for functional limits allows us to write 


lim (g(x) - g(c)) = lim ' 9 ^ 9<yC ' ) 

x — yc ' x^-c \ x — C 


j (x — c) = g'(c ) -0 = 0. 


It follows that lim X ^ c g(x) = g(c). 


□ 


Combinations of Differentiable Functions 


The Algebraic Limit Theorem (Theorem 2.3.3) led easily to the conclusion 
that algebraic combinations of continuous functions are continuous. With only 
slightly more work, we arrive at a similar conclusion for sums, products, and 
quotients of differentiable functions. 

Theorem 5.2.4 (Algebraic Differentiability Theorem). Let f and g be 

functions defined on an interval A, and assume both are differentiable at some 
point c e A. Then, 


(i) (f + gy(c) = f'(c)+g'(c), 

(ii) ( kf)'(c ) = kf'(c), for all k G R, 

(iii) (fg)'(c) = f(c)g(c) + f(c)g’(c), and 

(iv) (f/g)' (c) = g<c ^ ^ that g(c) =/=■ 0. 

Proof. Statements (i) and (ii) are left as exercises. To prove (iii), we rewrite the 
difference quotient as 


(JgKx) - (/ff)(c) 

X — C 


f(x)g(x) - f(x)g(c) + f{x)g(c) - f{c)g{c) 


x — c 


fO 0 


~g(x)-g(c)' 


V) 

i 

i 

L 

h 


f(x) - /(c) 


x — c 


Because / is differentiable at c, it is continuous there and thus lim X ^ c f(x) = 
/(c). This fact, together with the functional-limit version of the Algebraic Limit 
Theorem (Theorem 4.2.4), justifies the conclusion 


lim (/g)Qc) - (/ff)(c) 

X^c X — C 


f( c )g'( c ) + /'(c)g(c). 


A similar proof of (iv) is possible, or we can use an argument based on the 
next result. Each of these options is discussed in Exercise 5.2.3. □ 
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The composition of two differentiable functions also fortunately results in an- 
other differentiable function. This fact is referred to as the Chain Rule. To dis- 
cover the proper formula for the derivative of the composition g o f , we can 
write 


0°/)'(c) = lim 

X^-C 


x — c 


j. g(/(X» zlilDl . in ~ ID 

X^c /(x) — /(c) X — C 

9V(c)) • m 


With a little polish, this string of equations could qualify as a proof except for the 
pesky fact that the f(x) — /(c) expression causes problems in the denominator if 
f(x) = /(c) for x values in arbitrarily small neighborhoods of c. (The function 
g 2 (%) discussed in Section 5.1 exhibits this behavior near c = 0.) The upcoming 
proof of the Chain Rule manages to finesse this problem but in content is essen- 
tially the argument just given. Another approach is sketched in Exercise 5.2.4. 

Theorem 5.2.5 (Chain Rule). Let f : A R and g : B R satisfy 
f(A) C B so that the composition g o f is defined. If f is differentiable at 
c E A and if g is differentiable at /(c) E B , then g o / is differentiable at c with 
C g°fY(c ) = g'(f(c)) ■ f'(c). 

Proof. Because g is differentiable at /(c), we know that 


</(/(c)) 


lim 

y->f(c) 


g{y)-g{f{c)) 
y - /(c) 


Another way to assert this same fact is to let d{y) be the difference quotient 

a) 

and observe that lim y ^f( c ) d(y) = g'(f(c)). At the moment, d(y) is not defined 
when y = /(c), but it should seem natural to declare that d(f(c)) = ^ 7 (/(c)), 
so that d is continuous at /(c). 

Now, we come to the finesse. Equation (1) can be rewritten as 


( 2 ) g(y) ~ g(f(c)) = d(y)(y - f(c)). 

Observe that this equation holds for all y E B including y = /(c). Thus, we 
are free to substitute y = f(t) for any arbitrary t E A. If t ^ c, we can divide 
equation (2) by (t — c) to get 

g(f(t)) zAim = d(f(t))hMmm 

t-c 11 t-c 


for all t c. Finally, taking the limit as t c and applying the Algebraic Limit 
Theorem together with Theorem 4.3.9 yields the desired formula. □ 
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Figure 5.3: The Interior Extremum Theorem. 


Darboux’s Theorem 

One conclusion from this chapter’s introduction is that although continuity is 
necessary for the derivative to exist, it is not the case that the derivative function 
itself will always be continuous. Our specific example was = x 2 sin(l/x), 

where we set ^(0) = 0. By tinkering with the exponent of the leading x 2 factor, 
it is possible to construct examples of differentiable functions with derivatives 
that are unbounded, or twice-differentiable functions that have discontinuous 
second derivatives (Exercise 5.2.7). The underlying principle in all of these 
examples is that by controlling the size of the oscillations of the original function, 
we can make the corresponding oscillations of the slopes volatile enough to 
prevent the existence of the relevant limits. 

It is significant that for this class of examples, the discontinuities that arise 
are never simple jump discontinuities. (A precise definition of “jump discon- 
tinuity” is presented in Section 4.6.) We are now ready to confirm our earlier 
suspicions that although derivatives do not in general have to be continuous, 
they do possess the intermediate value property. (See Definition 4.5.3.) This 
surprising observation is a fairly straightforward corollary to the more obvious 
observation that differentiable functions attain maximums and minimums only 
at points where the derivative is equal to zero (Fig. 5.3). 

Theorem 5.2.6 (Interior Extremum Theorem). Let f be differentiable on 
an open interval (a, b). If f attains a maximum value at some point c E (a, b) 
(i.e., /(c) > f(x) for all x E (a,b)), then f'(c ) = 0. The same is true if /(c) is 
a minimum value. 


Proof. Because c is in the open interval (a, 6), we can construct two sequences 
(x n ) and (y n ), which converge to c and satisfy x n < c < y n for all n E N. The 
fact that /(c) is a maximum implies that f(y n ) — /(c) < 0 for all n, and thus 



i im ILl ») - ID 

n— >oo 


< 0 


Vn-C 
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by the Order Limit Theorem (Theorem 2.3.4). In a similar way, 

f(Xn) ~ /(c) > Q 

X n ~ C ~ 

for each x n because both numerator and denominator are negative. This implies 
that 



lim 

n— >• oo 


f(Xn) ~ f(c) 

X n ~ C 


> 0 


1 


and therefore /'(c) = 0, as desired. 


□ 


The Interior Extremum Theorem is the fundamental fact behind the use of 
the derivative as a tool for solving applied optimization problems. This idea, 
discovered and exploited by Pierre de Fermat, is as old as the derivative itself. 
In a sense, finding maximums and minimums is arguably why Fermat invented 
his method of finding slopes of tangent lines. It was 200 years later that the 
French mathematician Gaston Darboux (1842-1917) pointed out that Fermat’s 
method of finding maximums and minimums carries with it the implication that 
if a derivative function attains two distinct values /'(a ) and /'(&), then it must 
also attain every value in between. 

Theorem 5.2.7 (Darboux’s Theorem). If f is differentiable on an interval 
a, b\, and if a satisfies /'(a ) < a < f\b) ( or f'(a ) > a > f'(b)), then there 
exists a point c E (a, b) where f'(c ) = a. 

Proof. We first simplify matters by defining a new function g(x) = f(x) — ax 
on [a, b\. Notice that g is differentiable on [a, b] with g'(x) = f'(x) — a. In terms 
of g, our hypothesis states that g'(a) < 0 < g f (b), and we hope to show that 
g'(c) = 0 for some c E (a, b). 

The remainder of the argument is outlined in Exercise 5.2.11. □ 


Exercises 

Exercise 5.2.1. Supply proofs for parts (i) and (ii) of Theorem 5.2.4. 

Exercise 5.2.2. Exactly one of the following requests is impossible. Decide 
which it is, and provide examples for the other three. In each case, let’s assume 
the functions are defined on all of R. 

(a) Functions / and g not differentiable at zero but where fg is differentiable 
at zero. 

(b) A function / not differentiable at zero and a function g differentiable at 
zero where fg is differentiable at zero. 

(c) A function / not differentiable at zero and a function g differentiable at 
zero where / + g is differentiable at zero. 

(d) A function / differentiable at zero but not differentiable at any other point. 
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Exercise 5.2.3. (a) Use Definition 5.2.1 to produce the proper formula for 

the derivative of h(x) = 1/x. 

(b) Combine the result in part (a) with the Chain Rule (Theorem 5.2.5) to 
supply a proof for part (iv) of Theorem 5.2.4. 

(c) Supply a direct proof of Theorem 5.2.4 (iv) by algebraically manipulat- 
ing the difference quotient for (f/g) in a style similar to the proof of 
Theorem 5.2.4 (iii). 

Exercise 5.2.4. Follow these steps to provide a slightly modified proof of the 
Chain Rule. 

(a) Show that a function h : A R is differentiable at a E A if and only if 

there exists a function l : A R which is continuous at a and satisfies 

h(x) — h(a ) = l(x)(x — a) for all x E A. 


(b) Use this criterion for differentiability (in both directions) to prove Theorem 
5.2.5. 


Exercise 5.2.5. Let f a (x) 


x a if x > 0 

0 if x < 0 . 


(a) For which values of a is / continuous at zero? 

(b) For which values of a is / differentiable at zero? In this case, is the 
derivative function continuous? 

(c) For which values of a is / twice-differentiable? 

Exercise 5.2.6. Let g be defined on an interval A , and let c E A. 

(a) Explain why g'(c) in Definition 5.2.1 could have been given by 


g'{c) = lim 

h—t 0 


g(c + h) - g(c) 
h 


(b) Assume A is open. If g is differentiable at c E A, show 


g'(c)= lim 

h — ^0 


g(c + h) - g(c - h) 
2 h 


Exercise 5.2.7. Let 


9a(x) 


x a sin(l/x) if x 7 ^ 0 
0 if x = 0 . 


Find a particular (potentially noninteger) value for a so that 
(a) g a is differentiable on R but such that g' a is unbounded on [0, 1]. 
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(b) g a is differentiable on R with g' a continuous but not differentiable at zero. 

(c) g a is differentiable on R and g' a is differentiable on R, but such that g” is 
not continuous at zero. 


Exercise 5.2.8. Review the definition of uniform continuity (Definition 4.4.4). 
Given a differentiable function f : A —r R, let’s say that / is uniformly differ- 
entiable on A if, given e > 0 there exists a <5 > 0 such that 


f(x) - f(y) 

x-y 


f\v ) 


< e whenever 0 < 


x 


y 


< 5 . 


— - nA V 


(a) Is f(pc) = x 2 uniformly differentiable on R? How about g(x) = x 

(b) Show that if a function is uniformly differentiable on an interval A, then 
the derivative must be continuous on A. 

(c) Is there a theorem analogous to Theorem 4.4.7 for differentiation? Are 
functions that are differentiable on a closed interval [a, b] necessarily uni- 
formly differentiable? 

Exercise 5.2.9. Decide whether each conjecture is true or false. Provide an 
argument for those that are true and a counterexample for each one that is false. 

(a) If f exists on an interval and is not constant, then f must take on some 
irrational values. 

(b) If f exists on an open interval and there is some point c where / '(c) > 0 , 
then there exists a 5-neighborhood Vs(c) around c in which f'(x) > 0 for 
all x G Vs(c). 

(c) If / is differentiable on an interval containing zero and if lim^^o f'( x ) — A, 
then it must be that L = /'( 0). 


Exercise 5.2.10. Recall that a function / : (a, b) R is increasing on (a, b) 
if f(%) < f(y) whenever x < y in (a, b). A familiar mantra from calculus is 
that a differentiable function is increasing if its derivative is positive, but this 
statement requires some sharpening in order to be completely accurate. 

Show that the function 


g( x ) 


_ f x/2 + x 2 sm(l/x) if x 0 


0 


if x = 0 


is differentiable on R and satisfies g\ 0) > 0. Now, prove that g is not increasing 
over any open interval containing 0. 

In the next section we will see that / is indeed increasing on (a, b) if and 
only if f'(x) > 0 for all x G (a, b). 

Exercise 5.2.11. Assume that g is differentiable on [a, b] and satisfies g'(a) < 
0 < g'(b). 

(a) Show that there exists a point x G (a, b) where g(a) > g(x), and a point 
y G (a, b) where g(y) < 9(b). 
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Figure 5.4: The Mean Value Theorem. 


(b) Now complete the proof of Darboux’s Theorem started earlier. 


Exercise 5.2.12 (Inverse functions). If / : [a, b] -a R is one-to-one, then 
there exists an inverse function / -1 defined on the range of / given by = 

x where y = f(x). In Exercise 4.5.8 we saw that if / is continuous on [a, 6], 
then / -1 is continuous on its domain. Let’s add the assumption that / is 
differentiable on [a, b } with f'(x) ^ 0 for all x E [a, b\. Show / -1 is differentiable 
with 

(/ _1 f (y) = 77 V where y = f(x). 

J (x) 


5.3 The Mean Value Theorems 

The Mean Value Theorem (Fig. 5.4) makes the geometrically plausible assertion 
that a differentiable function / on an interval [a, b\ will, at some point, attain a 
slope equal to the slope of the line through the endpoints (a, / (a)) and (6, /(&)). 
More tersely put, 

fX) - /(«) 

b — a 

for at least one point c E (a, b). 

On the surface, there does not appear to be anything especially remarkable 
about this observation. Its validity appears undeniable — much like the Inter- 
mediate Value Theorem for continuous functions — and its proof is rather short. 
The ease of the proof, however, is misleading, as it is built on top of some 
hard-fought accomplishments from the study of limits and continuity. In this 
regard, the Mean Value Theorem is a kind of reward for a job well done. As we 
will see, it is a prize of exceptional value. Although the result itself is geomet- 
rically unsurprising, the Mean Value Theorem is the cornerstone of the proof 
for almost every major theorem pertaining to differentiation. We will use it to 
prove L’Hospital’s rules regarding limits of quotients of differentiable functions. 
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a c b 

Figure 5.5: Rolle’s Theorem. 


A rigorous analysis of how infinite series of functions behave when differentiated 
requires the Mean Value Theorem (Theorem 6.4.3), and it is the crucial step in 
the proof of the Fundamental Theorem of Calculus (Theorem 7.5.1). It is also 
the fundamental concept underlying Lagrange’s Remainder Theorem (Theorem 
6.6.3) which approximates the error between a Taylor polynomial and the func- 
tion that generates it. 

The Mean Value Theorem can be stated in various degrees of generality, 
each one important enough to be given its own special designation. Recall that 
the Extreme Value Theorem (Theorem 4.4.2) states that continuous functions 
on compact sets always attain maximum and minimum values. Combining this 
observation with the Interior Extremum Theorem for differentiable functions 
(Theorem 5.2.6) yields a special case of the Mean Value Theorem first noted by 
the mathematician Michel Rolle (1652-1719) (Fig. 5.5). 


Theorem 5.3.1 (Rolle’s Theorem). Let f : [a, b\ R be continuous on [a, b] 
and differentiable on (a, b). If /(a) = f(b), then there exists a point c E (a, b) 
where f'(c ) = 0. 


Proof. Because / is continuous on a compact set, / attains a maximum and a 
minimum. If both the maximum and minimum occur at the endpoints, then / 
is necessarily a constant function and f'[x ) = 0 on all of (a, b). In this case, we 
can choose c to be any point we like. On the other hand, if either the maximum 
or minimum occurs at some point c in the interior (a, 6), then it follows from 
the Interior Extremum Theorem (Theorem 5.2.6) that /'(c) = 0. □ 


Theorem 5.3.2 (Mean Value Theorem). Iff: [a,6]->R is continuous on 
a, b\ and differentiable on (a, b), then there exists a point c E (a, b) where 

m - m ' 

b — a 

Proof. Notice that the Mean Value Theorem reduces to Rolle’s Theorem in the 
case where /(a) = f(b). The strategy of the proof is to reduce the more general 
statement to this special case. 
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The equation of the line through (a, /(a)) and (&,/(&)) is 




a) + /(a). 



We want to consider the difference between this line and the function f(x). To 
this end, let 


d(x) = f(x) - 


f(b) ~ f(a ) 


a 


(x — a) + /(a) 


and observe that d is continuous on [a, 6], differentiable on (a, 5), and satisfies 
d(a) = 0 = d{b). Thus, by Rohe’s Theorem, there exists a point c E (a, b ) where 
d'(c) = 0. Because 

d a*) = m - W m 


b — a 


we get 


o = m 


f(b) ~ f(a) 

b — a 


which completes the proof. 


□ 


The point has been made that the Mean Value Theorem manages to find its 
way into nearly every proof of any statement related to the geometrical nature 
of the derivative. As a simple example, if / is a constant function f{pc) — k on 
some interval A, then a straightforward calculation of f using Definition 5.2.1 
shows that f'(x) = 0 for all x E A. But how do we prove the converse statement? 
If we know that a differentiable function g satisfies g'(x) = 0 everywhere on A, 
our intuition suggests that we should be able to prove g(x) is constant. It is the 
Mean Value Theorem that provides us with a way to articulate rigorously what 
seems geometrically valid. 


Corollary 5.3.3. If g : A R is differentiable on an interval A and satisfies 
g'(x) = 0 for all x E A, then g(x) = k for some constant k E R. 


Proof. Take x,y E A and assume x < y. Applying the Mean Value Theorem to 
g on the interval [x, y\, we see that 


/ = ffO) - ajx) 
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for some c G A. Now, g'(c ) = 0, so we conclude that g(y) = g(x). Set k equal 
to this common value. Because x and y are arbitrary, it follows that g(pc) = k 
for all x G A. □ 

Corollary 5.3.4. If f and g are differentiable functions on an interval A and 
satisfy f'(x) = g'(x) for all x G A, then f(x) = g(x) + k for some constant 
k G R. 

Proof Let h(x) = f(x) — g(x) and apply Corollary 5.3.3 to the differentiable 
function h. □ 

The Mean Value Theorem has a more general form due to Cauchy. It is this 
generalized version of the theorem that is needed to analyze L ’Hospital’s rules 
and Lagrange’s Remainder Theorem. 

Theorem 5.3.5 (Generalized Mean Value Theorem). If f and g are con- 
tinuous on the closed interval [a, b] and differentiable on the open interval (a, b), 
then there exists a point c G (a, b) where 

[j f(b) ~ f{a)]g'(c) = [ g(b ) - g(a)]f(c). 

If g' is never zero on (a, b), then the conclusion can be stated as 

f(c) = f(b) - /(a) 

9'(c) g(b) — g(a) ’ 

Proof. This result follows by applying the Mean Value Theorem to the func- 
tion h(x) = [/(b) — f(a)\g{x) — [g(b) — g(a)]f(x). The details are requested in 
Exercise 5.3.5. □ 


L’Hospital’s Rules 


The Algebraic Limit Theorem asserts that when taking a limit of a quotient of 
functions we can write 


lim 

x— >c 


fp) 

g(x) 


lim f(x) 

x—>c 

lim g{x) ’ 

x^-c 


provided that each individual limit exists and lim X ^ c g(x) is not zero. If the 
denominator does converge to zero and the numerator has a nonzero limit, 
then it is not difficult to argue that the quotient f(x)/g(x) grows in absolute 
value without bound as x approaches c. L’Hospital’s Rules are named for the 
Marquis de L’Hospital (1661-1704), who learned the results from his tutor, 
Johann Bernoulli (1667-1748), and published them in 1696 in what is regarded 
as the first calculus text. Stated in different levels of generality, they are an 
effective tool for handling the indeterminant cases when either numerator and 
denominator both tend to zero or both tend simultaneously to infinity. 
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Theorem 5.3.6 (L’Hospital’s Rule: 0/0 case). Let f and g be continuous 
on an interval containing a, and assume f and g are differentiable on this 
interval with the possible exception of the point a. If f{a ) = g(a) = 0 and 
g'(x) f 0 for all x f a, then 

lim Iff = L implies lim ff = L. 
x^a g'[x) X^a g[x) 

Proof. This argument follows from a straightforward application of the Gener- 
alized Mean Value Theorem. It is requested as Exercise 5.3.11. □ 


L ’Hospital’s Rule remains true if we replace the assumption that /(a) = 
g(a) = 0 with the hypothesis that lim X ^ a g(x) = oo. To this point we have not 
been explicit about what it means to say that a limit equals oo. The logical 
structure of such a definition is precisely the same as it is for finite functional 
limits. The difference is that rather than trying to force the function to take 
on values in some small e-neighborhood around a proposed limit, we must show 
that g(x) eventually exceeds any proposed upper bound. The arbitrarily small 
e > 0 is replaced by an arbitrarily large M > 0. 


Definition 5.3.7. Given g : A R and a limit point c of A, we say that 
ihrn^c g(x) = oo if, for every M > 0, there exists a 6 > 0 such that whenever 
0 < \x — c\ < 5 it follows that g(x) > M. 

We can define lin r x ^ c g(x) = — oo in a similar way. 


The following version of L’Hospital’s Rule is typically referred to as the oo/oo 
case even though the hypothesis only requires that the function in the denomi- 
nator tend to infinity. To simplify the notation of the proof, we state the result 
using a one-sided limit. 


Theorem 5.3.8 (L’Hospital’s Rule: oo/oo case). Assume f and g are 

differentiable on (a, b ) and that g'(x) f 0 for all x E (a, b). If \rm x ^ a g(x) = oo 
(or — oo ), then 


lim = L 


x^-a 


g' 0 ) 


implies lim ff = L. 
x^a g(x) 


Proof. Let e > 0. Because lim^-^ = T, there exists a Si > 0 such that 

9 \X) 


fix) 

g'i x ) 


e 



for all a < x < a + 5i. For convenience of notation, let t = a + Si and note that 
t is fixed for the remainder of the argument. 

Our functions are not defined at a, but for any x E (a, t ) we can apply the 
Generalized Mean Value Theorem on the interval [x, t\ to get 


fix) - ft) = f'jc) 
gix) - git) g'i c ) 
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for some c E (x,t). Our choice of t then implies 



L “2< 


fi x ) - f(t) 

g(x) - g(t) 


<L + - 


for all x in (a, t). 

In an effort to isolate the fraction the strategy is to multiply inequality 

(1) by (g(x) — g(t))/g(x). We need to be sure, however, that this quantity is 
positive, which amounts to insisting that 1 > g(t)/g(x). Because t is fixed and 
lim^ a g{pc) = oo, we can choose S 2 > 0 so that g{pc) > g(t) for all a < x < a+5 2 . 
Carrying out the desired multiplication results in 




IP 0 - fit) 

g{x) 




1 


which after some algebraic manipulations yields 


L 


e -Lg(t) + \g{t) + f(t) ^ f(x) ^ e -Lg(t) - § g(t) + f(t) 
2 + g(x) < g{x) < + 2 + g{x) 


Again, let’s remind ourselves that t is fixed and that lim X ^ a g(x) = oo. Thus, 
we can choose a £3 such that a < x < a + £3 implies that g{x) is large enough 
to ensure that both 


-Lg(t) + | g(t) + f(t) 

g(x) 


and 


- Lg(t ) - y(t) + f(t) 

g(x) 


are less than e/2 in absolute value. Putting this all together and choosing 
S = minimi, £ 2 , ^ 3 } guarantees that 


fix) 

g(x) 


L 


< e 


for all a < x < a + S. 


□ 


Exercises 


Exercise 5.3.1. Recall from Exercise 4.4.9 that a function / : A 
Lipschitz on A if there exists an M > 0 such that 


R is 


/ 0 ) - f(y) 

x-y 


< M 


for all r / y in A. 


(a) Show that if / is differentiable on a closed interval [a, b] and if f is con- 
tinuous on [a, 6 ], then / is Lipschitz on [a, b\. 

(b) Review the definition of a contractive function in Exercise 4.3.11. If we 
add the assumption that \f'(x)\ < 1 on [a, 6 ], does it follow that / is 
contractive on this set? 
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Exercise 5.3.2. Let / be differentiable on an interval A. If f'(x) ^ 0 on i, 
show that / is one-to-one on A. Provide an example to show that the converse 
statement need not be true. 


Exercise 5.3.3. Let h be a differentiable function defined on the interval [0, 3], 
and assume that h( 0) = 1, h{ 1) = 2, and h{ 3) = 2. 

(a) Argue that there exists a point d E [0, 3] where h(d) = d. 

(b) Argue that at some point c we have h! (c ) = 1/3. 

(c) Argue that h! {x ) = 1/4 at some point in the domain. 

Exercise 5.3.4. Let / be differentiable on an interval A containing zero, and 
assume (x n ) is a sequence in A with (x n ) —> 0 and x n ^ 0. 

(a) If f(x n ) = 0 for all nGiV, show /( 0) = 0 and /'( 0) = 0. 


(b) Add the assumption that / is twice-differentiable at zero and show that 
f"( 0) = 0 as well. 

Exercise 5.3.5. (a) Supply the details for the proof of Cauchy’s Generalized 

Mean Value Theorem (Theorem 5.3.5). 


(b) Give a graphical interpretation of the Generalized Mean Value Theorem 
analogous to the one given for the Mean Value Theorem at the beginning 
of Section 5.3. (Consider / and g as parametric equations for a curve.) 


Exercise 5.3.6. (a) Let g : [0,a] R be differentiable, g{ 0) = 0, and 

g'(x ) | < M for all x E [0,a]. Show \g{x)\ < Mx for all x E [0,a . 

(b) Let h : [0, a] R be twice differentiable, h'{ 0) = h{ 0) = 0 and \h"(x)\ < 
M for all x E [0, a]. Show \h(x)\ < Mx 2 / 2 for all x E [0, a]. 

(c) Conjecture and prove an analogous result for a function that is differen- 
tiable three times on [0,a]. 


Exercise 5.3.7. A fixed point of a function / is a value x where f{x) = x. 
Show that if / is differentiable on an interval with f'(x) ^ 1, then / can have 
at most one fixed point. 


Exercise 5.3.8. Assume / is continuous on an interval containing zero and 
differentiable for all x ^ 0. If lim^o f '( x ) = A, show f'( 0) exists and equals L. 

Exercise 5.3.9. Assume / and g are as described in Theorem 5.3.6, but now 
add the assumption that / and g are differentiable at a, and f and g' are 
continuous at a with g' (a) ^ 0. Find a short proof for the 0/0 case of L ’Hospital’s 
Rule under this stronger hypothesis. 
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Exercise 5.3.10. Let f(x) = x sin(l/x 4 )e _1 / x and g(x) = e 1 ^ . Using the 
familiar properties of these functions, compute the limit as x approaches zero of 
/(x), g(x), f(x)/g(x), and f'(x)/g'{x). Explain why the results are surprising 
but not in conflict with the content of Theorem 5.3.6. 1 


Exercise 5.3.11. (a) Use the Generalized Mean Value Theorem to furnish a 

proof of the 0/0 case of L’Hospital’s Rule (Theorem 5.3.6). 

(b) If we keep the first part of the hypothesis of Theorem 5.3.6 the same but 
we assume that 

f(x) 


lim 

x^a g'[x) 


oo. 


does it necessarily follow that 


/O) ? 

Inn — r = oo: 


x^-a 


gV) 


Exercise 5.3.12. If / is twice differentiable on an open interval containing a 
and f" is continuous at a, show 

/(a + h) - 2 f(a) + f(a - h) _ „ 

o h 2 ~ J [ 


(Compare this to Exercise 5.2.6(b).) 


5.4 A Continuous Nowhere-DifFerentiable 
Function 

Exploring the relationship between continuity and differentiability has led to 
both fruitful results and pathological counterexamples. The bulk of discussion 
to this point has focused on the continuity of derivatives, but historically a sig- 
nificant amount of debate revolved around the question of whether continuous 
functions were necessarily differentiable. Early in the chapter, we saw that con- 
tinuity was a requirement for differentiability, but, as the absolute value function 
demonstrates, the converse of this proposition is not true. A function can be 
continuous but not differentiable at some point. But just how nondifferentiable 
can a continuous function be? Given a finite set of points, it is not difficult to 
imagine how to construct a graph with corners at each of these points, so that 
the corresponding function fails to be differentiable on this finite set. The trick 
gets more difficult, however, when the set becomes infinite. For instance, is it 
possible to construct a function that is continuous on all of R but fails to be 
differentiable at every rational point? Not only is this possible, but the situation 
is even more disconcerting. In 1872, Karl Weierstrass presented an example of 
a continuous function that was not differentiable at any point. (It seems to be 


1 A large class of “counterexamples” of this sort to L’Hospital’s Rule are explored in [4]. 
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Figure 5.6: The function h{pc). 


the case that Bernhard Bolzano had his own example of such a beast as early 
as 1830, but it was not published until much later.) 

Weierstrass actually discovered a class of nowhere-differentiable functions of 
the form 

oo 

f(x ) = a n cos (b n x) 

n = 0 


where the values of a and b are carefully chosen. Such functions are specific 
examples of Fourier series discussed in Section 8.5. The details of Weierstrass’ 
argument are simplified if we replace the cosine function with a piecewise linear 
function that has oscillations qualitatively like cos(x). 

Define 



on the interval [—1,1] and extend the definition of h to all of R by requiring 
that h{pc + 2) = h(x). The result is a periodic “sawtooth” function (Fig. 5.6). 


Exercise 5.4.1. Sketch a graph of (l/2)h(2x) on 
description of the functions 



Give a qualitative 




h( 2 n x) 


as n gets larger. 


Now, define 


oo oo 

<? 0 ) = ^2 h n(x) = Yi 

n = 0 n = 0 



h( 2 n x). 


The claim is that g(x) is continuous on all of R but fails to be differentiable at 
any point. 


Infinite Series of Functions and Continuity 

The definition of g{x) is a significant departure from the way we usually define 
functions. For each x G R, g(x) is defined to be the value of an infinite series. 
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Figure 5.7: A sketch of g(x) = ^2^ =0 (l/2 n )h(2 n x) . 


Exercise 5.4.2. Fix x E R. Argue that the series 

OO -j 

n = 0 

converges absolutely and thus g(x) is properly defined. 

Exercise 5.4.3. Taking the continuity of h(x) as given, reference the proper 
theorems from Chapter 4 that imply that the finite sum 

m .. 

9m(x) = Yi ^ h e n x) 
n — 0 


is continuous on R. 

This brings us to an archetypical question in analysis: When do conclusions 
that are valid in finite settings extend to infinite ones? A finite sum of continuous 
functions is certainly continuous, but does this necessarily hold for an infinite 
sum of continuous functions? In general, we will see that this is not always the 
case. For this particular sum, however, the continuity of the limit function g{pc) 
can be proved. Deciphering when results about finite sums of functions extend 
to infinite sums is one of the fundamental themes of Chapter 6. Although a 
self-contained argument for the continuity of g is not beyond our means at this 
point, we will nevertheless postpone the proof (see, for example, Exercise 6.4.3), 
leaving it as an enticement for the upcoming study of uniform convergence. 

Exercise 5.4.4. As the graph in Figure 5.7 suggests, the structure of g(x) is 
quite intricate. Answer the following questions, assuming that g{x) is indeed 
continuous. 

(a) How do we know g attains a maximum value M on [0,2]? What is this 
value? 
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(b) Let D be the set of points in (0, 2) where g attains its maximum. That is 
D = {x G [0,2] : g(x) = M}. Find one point in D. 

(c) Is D finite, countable, or uncountable? 

NondifFerentiability 

When the proper tools are in place, the proof that g is continuous is quite 
straightforward. The more difficult task is to show that g is not differentiable 
at any point in R. 

Let’s first look at the point x = 0. Our function g does not appear to 
be differentiable here, and a rigorous proof is not too difficult. Consider the 
sequence x m = l/2 m , where m — 0, 1 , 2, ... . 

Exercise 5.4.5. Show that 


9(x m ) - g( o) 


Xm 



= m + 1, 


and use this to prove that ^'(0) does not exist. 

Any temptation to say something like g'( 0) = oo should be resisted. Setting 
Xm = — (l/2 m ) in the previous argument produces difference quotients heading 
toward — oo. The geometric manifestation of this is the “cusp” that appears at 
x = 0 in the graph of g. 


Exercise 5.4.6. (a) Modify the previous argument to show that g'( 1) does 

not exist. Show that g'( 1/2) does not exist. 


(b) Show that g'(x) does not exist for any rational number of the form x = 
p/2 k where p E Z and k G N U {0}. 


The points described in Exercise 5.4.6 (b) are called dyadic points. If x = 
p/2 k is a dyadic rational number, then the function h n has a corner at x as long 
as n > k. Thus, it should not be too surprising that g fails to be differentiable 
at points of this form. The argument is more delicate at points between the 
dyadic points. 

Assume x is not a dyadic number. For a fixed value of m E N U {0}, x falls 
between two adjacent dyadic points, 


Pm 
2 m 


< X < 


Pm + 1 

2 m 


Set x m = Pm/ 2 m and y m = (Pm + l)/2 m . Repeating this for each m yields two 
sequences (x m ) and (y m ) satisfying 


m — 


limx 


lim y m = x and x m < x < y m . 
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Exercise 5.4.7. (a) First prove the following general lemma: Let / be defined 
on an open interval J and assume / is differentiable at a E J . If (a n ) and (b n ) 
are sequences satisfying a n < a < b n and lim a n = lim b n = a, show 



lim 

n— >• oo 


f(b n ) - f(a n ) 



(b) Now use this lemma to show that g'(x) does not exist. 

Weierstrass’s original 1872 paper contained a demonstration that the infinite 
sum 

oo 

f(x ) = a n cos (b n x) 

n = 0 

defined a continuous nowhere- differentiable function provided 0 < a < 1 and 
b was an odd integer satisfying ab > 1 + 3tt / 2. The condition on a is easy to 
understand. If 0 < a < 1, then a convergent geometric series, and 

the forthcoming Weierstrass M-Test (Theorem 6.4.5) can be used to conclude 
that / is continuous. The restriction on b is more mysterious. In 1916, G.H. 
Hardy extended Weierstrass’ result to include any value of b for which ab > 1. 
Without looking at the details of either of these arguments, we nevertheless get 
a sense that the lack of a derivative is intricately tied to the relationship between 
the compression factor (the parameter a) and the rate at which the frequency 
of the oscillations increases (the parameter b). 

Exercise 5.4.8. Review the argument for the nondifferentiability of g(x) at 
nondyadic points. Does the argument still work if we replace g{x) with the 
summation l/2 n )h(3 n x)? Does the argument work for the function 

E~=o(V3")/i(2"aO? 


5.5 Epilogue 

Far from being an anomaly to be relegated to the margins of our understanding 
of continuous functions, Weierstrass’s example and those like it should actually 
serve as a guide to our intuition. The image of continuity as a smooth curve 
in our mind’s eye severely misrepresents the situation and is the result of a 
bias stemming from an overexposure to the much smaller class of differentiable 
functions. The lesson here is that continuity is a strictly weaker notion than 
differentiability. In Section 3.6, we alluded to a corollary of the Baire Category 
Theorem, which asserts that Weierstrass’s construction is actually typical of 
continuous functions. We will see that most continuous functions are nowhere- 
differentiable, so that it is really the differentiable functions that are the excep- 
tions rather than the rule. The details of how to phrase this observation more 
rigorously are spelled out in Section 8.2. 

To say that the nowhere-differentiable function g constructed in the previous 
section has “corners” at every point of its domain misses the mark. Weierstrass’s 
original class of nowhere-differentiable functions was constructed from infinite 
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sums of smooth trigonometric functions. It is the densely nested oscillating 
structure that makes the definition of a tangent line impossible. So what hap- 
pens when we restrict our attention to monotone functions? How nondifferen- 
tiable can an increasing function be? Given a finite set of points, it is not difficult 
to piece together a monotone function which has actual corners — and thus is 
not differentiable — at each point in the given set. A natural question is whether 
there exists a continuous, monotone function that is nowhere-differentiable. 
Weierstrass suspected that such a function existed but only managed to produce 
an example of a continuous, increasing function which failed to be differentiable 
on a countable dense set (Exercise 7.5.11). In 1903, the French mathemati- 
cian Henri Lebesgue (1875-1941) demonstrated that Weierstrass’s intuition had 
failed on this account. Lebesgue proved that a continuous, monotone function 
would have to be differentiable at “almost” every point in its domain. To be 
specific, Lebesgue showed that, for every e > o, the set of points where such a 
function fails to be differentiable can be covered by a countable union of inter- 
vals whose lengths sum to less than e. This notion of “zero length,” or “measure 
zero” as it is called, was encountered in our discussion of the Cantor set and is 
explored more fully in Section 7.6, where Lebesgue’s substantial contribution to 
the theory of integration is discussed. 

With the relationship between the continuity of / and the existence of f 
somewhat in hand, we once more return to the question of characterizing the set 
of all derivatives. Not every function is a derivative. Darboux’s Theorem forces 
us to conclude that there are some functions — those with jump discontinuities 
in particular — that cannot appear as the derivative of some other function. 
Another way to phrase Darboux’s Theorem is to say that all derivatives must 
satisfy the intermediate value property. Continuous functions do possess the 
intermediate value property, and it is natural to ask whether every continuous 
function is necessarily a derivative. For this smaller class of functions, the 
answer is yes. The Fundamental Theorem of Calculus, treated in Chapter 7, 
states that, given a continuous function /, the function F{pc) = ff f satisfies 
F' = f. This does the trick. The collection of derivatives at least contains the 
continuous functions. The search for a concise characterization of all possible 
derivatives, however, remains largely unsuccessful. 

As a final remark, we will see that by cleverly choosing /, this technique 
of defining F via F(x) = ff f can be used to produce examples of continuous 
functions which fail to be differentiable on interesting sets, provided we can show 
that ff f is defined. The question of just how to define integration became a 
central theme in analysis in the latter half of the 19th century and has continued 
on to the present. Much of this story is discussed in detail in Chapter 7 and 
Section 8.1. 


Chapter 6 


Sequences and Series 
of Functions 

6.1 Discussion: The Power of Power Series 

In 1689, Jakob Bernoulli published his Tractatus de seriebus infinitis summa- 
rizing what was known about infinite series toward the end of the 17th century. 
Full of clever calculations and conclusions, this publication was also notable for 
one particular question that it didn’t answer; namely, what is the precise value 
of the series 

1 1 1 

1 — -f- — -j- — 

4 9 16 

Bernoulli convincingly argued that ^ 1/n 2 converged to something less than 
2 (see Example 2.4.4) but he was unable to find an explicit expression for 
the limit. Generally speaking, it is much harder to sum a series than it is to 
determine whether or not it converges. In fact, being able to find the sum of a 
convergent series is the exception rather than the rule. In this case, however, the 
series ^ 1/n 2 seemed so elementary; more elementary than, say, ^ 2 /2 n or 

l/^(^ + 1), both of which Bernoulli was able to handle. “If anyone finds 
and communicates to us that which has so far eluded our efforts,” Bernoulli 
wrote, “great will be our gratitude.” 1 

Geometric series are the most prominent class of examples that can be readily 
summed. In Example 2.7.5 we proved that 

l 

( 1 ) = 1 + X + x 2 + x 3 + • • • 

w 1-x 



1 


As quoted in [12], which contains a much more thorough account of this story. 
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for all \x\ < 1. Thus, for example, l/2 n = 2 and Xl^Lo( — V^) n = 3/4. 

Geometric series were part of mathematical folklore long before Bernoulli; how- 
ever, what was relatively novel in Bernoulli’s time was the idea of operating on 
infinite series such as (1) with tools from the budding theory of calculus. For 
instance, what happens if we take the derivative on each side of equation (1)? 
The left side is easy enough — we just get 1/(1 — x) 2 . But what about the right 
side? Adopting a 17th century mindset, a natural way to proceed is to treat the 
infinite series as a polynomial, albeit of infinite degree. Differentiation across 
equation (1) in this fashion gives 

(2) — — - = 0 H- 1 + 2x -T -f- 4lX^ + • • • . 

(1 — x) z 

Is this a valid formula, at least for values of x in (—1,1)? Empirical evidence 
suggests it is. Setting r = 1/2 we get 


oo 


n = 1 


4 = V — 

/ j 2 n— 1 


= 1 + 1 + 





5 


which feels plausible, and is in fact true. Although not Bernoulli’s requested 
series, this does suggest a possible new line of attack. 

Manipulations of this sort can be used to create a wide assortment of new 
series representations for familiar functions. Substituting — x 2 for x in (1) gives 

1 

(3) = 1 — x 2 + x 4 — x 6 + x 8 — ■ ■ ■ , 

1 + x z 


for all x G (—1,1). 

Once again closing our eyes to the potential danger of treating an infinite 
series as though it were a polynomial, let’s see what happens when we take 
antiderivatives. Using the fact that 


(arctan(x)) / = 


1 


1 + or 


and arctan(O) = 0, 


equation (3) becomes 



arctan(x) = x 1 


x 


7 


7 


+ 


Plugging x 


1 into equation (4) yields the striking relationship 


7 r 1111 

— — — — _l_ — — — _|_ — — ... 

4 3 5 7 9 

The constant 7 r, which arises from the geometry of circles, has somehow found its 
way into an equation involving the reciprocals of the odd integers. Is this a valid 
formula? Can we really treat the infinite series in (3) like a finite polynomial? 
Even if the answer is yes there is still another mystery to solve in this example. 
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Plugging x = 1 into equations (1), (2), or (3) yields mathematical gibberish, so 
is it prudent to anticipate something meaningful arising from equation (4) at 
this same value? Will any of these ideas get us closer to computing l/ n2 ? 

As it turned out, Bernoulli’s plea for help was answered in an unexpected way 
by Leonard Euler. At a young age, Euler was a student of Jakob Bernoulli’s 
brother Johann, and the stellar pupil quickly rose to become the preeminent 
mathematician of his age. Euler’s solution is impossible to anticipate. In 1735, 
he announced that 


1 1 1 

1+ 4 + 9 + 16 + 


7 n 


6 


a provocative formula that, even more than equation (5), hints at deep con- 
nections between geometry, number theory and analysis. Euler’s argument is 
quite short, but it needs to be viewed in the context of the time in which it was 
created. The “infinite polynomials” in this discussion are examples of power 
series , and a major catalyst for the expanding power of calculus in the 16th and 
17th centuries was a proliferation of techniques like the ones used to generate 
formulas (2), (3), and (4). The machinations of both algebra and calculus are 
relatively straightforward when restricted to the class of polynomials. So, if 
in fact power series could be treated more or less like unending polynomials, 
then there was a great incentive to try to find power series representations for 
familiar functions like e x , y/l + x, or sin(x). 

The appearance of arctan(x) in (4) is an encouraging sign that this might 
indeed always be possible. One of Isaac Newton’s more significant achievements 
was to produce a generalization of the binomial formula. If n E N, then old- 
fashioned finite algebra leads to the formula 


, N 77 , n(n— 1) o n(n — l)(n — 2) 3 „ 

(1 + x) n = 1 + nx + — -x 2 + — -x 3 H \-x n 


2 ! 


3! 


Through a process of experimentation and intuition Newton realized that for 
r ^ N, the infinite series 


r(r — 1) 2 r(r — l)(r — 2) 3 
(1 + x) = 1 + rx H — — -x 2 H — -x s + 


2 ! 


3! 


was meaningful, at least for x E ( — 1,1). Setting r = — 1, for example, yields 


1 


1 + x 


= 1 — X + x 2 — x 6 + x 4 — 


which is easily seen to be equivalent to equation (1). Setting r = 1/2 we get 


y/l + x = 1 H — x— 1 ° 


1 2 3 o 3-5 

X H TT—.X 6 7-T + 


2 2 2 ! 


2 3 3! 


2 4 4! 
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One way to lend a little credence to this formula for a/1 ± x is to focus on the 
first few terms and square the series: 



— 1 ± x ± Ox ± Ox ± • * * . 


Amid all of the unfounded assumptions we are making about infinity, calcula- 
tions like this induce a feeling of optimism about the legitimacy of our search 
for power series representations. 

Newton’s binomial series is the starting point for a modern proof of Euler’s 
famous sum, which is sketched out in detail in Section 8.3. Euler’s original 
1735 argument, however, started from the power series representation for sin(x). 
The formula 


sin x = x — 



3! + 5! 


X 


7 


7! 


± 


was known to Newton, Bernoulli, and Euler alike. In contrast to equation (1), 
we will see that this formula is valid for all xgR. Factoring out x and dividing 
yields a power series with leading coefficient equal to 1: 



smx 


x 


= 1 


X' 


± 


X 


X 


6 


3! 5! 


7! 


± 


Euler’s idea was to continue factoring the power series in (6), and his strategy 
for doing this was very much in keeping with what we have seen so far — treat 
the power series as though it were a polynomial and then extend the pattern to 
infinity. 

Factoring a polynomial of, say, degree three is straightforward if we know 
its roots. If p(x) = 1 ± ax ± bx 2 ± cx 3 has roots 7q, 7 * 2 , and 7*3, then 




To see this just directly substitute to get p( 0) = 1 and p(r\) = p(r 2 ) = p(rs) = 0. 

The roots of the power series in (6) are the nonzero roots of sinx, or x = 
dz7r, ±27 r, ±37 r, and so on. All right then — relying on his fabled intuition, Euler 
surmised that 
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where in the last step adjacent pairs of factors have been multiplied together. 
What happens if we continue to multiply out the factors on the right? Well, 
the constant term comes out to be 1 which happily matches the constant term 
on the left. The magic comes when we compare the x 2 term on each side 
of (7). Multiplying out the infinite number of factors on the right (using our 
imagination as necessary) and collecting like powers of x, equation (7) becomes 


1 



1 


1 


47 r 2 97 r 2 


x 2 + 


1 


1 


47 r 4 97 t 4 


X 4 + 


Equating the coefficients of x 2 on each side yields 

1 _ 1 1 1 

3! 7 r 2 47 t 2 9tt 2 

which when we multiply by — it 2 becomes 


7T 


2 



1 1 1 

= 14 1 1 

4 9 16 


+ 


Numerical approximations of each side of this equation confirmed for Euler 
that, despite the audacious leaps in his argument, he had landed on solid ground. 
By our standards, this derivation falls well short of being a proper proof, and 
we will have to tend to this in the upcoming chapters. The takeaway of this 
discussion is that the hard work ahead is worth the effort. Infinite series repre- 
sentations of functions are both useful and surprisingly elegant, and can lead to 
remarkable conclusions when they are properly handled. 

The evidence so far suggests power series are quite robust when treated as 
if they were finite in nature. Term-by-term differentiation produced a valid 
conclusion in equation (2), and taking antiderivatives fared similarly well in 
(4). We will see that these manipulations are not always justified for infinite 
series of more general types of functions. What is it about power series in 
particular that makes them so impervious to the dangers of the infinite? Of 
the many unanswered questions in this discussion, this last one is probably the 
most central, and the most important to understanding series of functions in 
general. 


6.2 Uniform Convergence of a Sequence 
of Functions 

Adopting the same strategy we used in Chapter 2, we will initially concern 
ourselves with the behavior and properties of converging sequences of func- 
tions. Because convergence of infinite series is defined in terms of the associated 
sequence of partial sums, the results from our study of sequences will be imme- 
diately applicable to the questions we have raised about both power series and 
more general infinite series of functions. 
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Figure 6.1: /i, / 5 , f w , AND f 2 o WHERE f n = (x 2 + nx)/n. 

Pointwise Convergence 

Definition 6.2.1. For each n E N, let f n be a function defined on a set 4CR. 
The sequence (/ n ) of functions converges pointwise on A to a function / if, for 
all x E A, the sequence of real numbers f n (x) converges to f(x). 

In this case, we write f n -> /, lim f n = /, or f n (x) = f(x). This 

last expression is helpful if there is any confusion as to whether x or n is the 
limiting variable. 


Example 6.2.2. (i) Consider 

fn{x ) = {x 2 + nx)/n 


on all of R. Graphs of /i,/ 5 ,/io, and /20 (Fig. 6.1) give an indication of 
what is happening as n gets larger. Algebraically, we can compute 


lim f n (x) 

n— (X) 


x 2 + nx 

lim 

n— ^00 n 


lim 

n — 00 



+ X = X. 


Thus, (/ n ) converges pointwise to f(x) — x on R. 


(ii) Let g n (x) = x n on the set [0, 1], and consider what happens as n tends to 
infinity (Fig. 6.2). If 0 < x < 1, then we have seen that x n 0. On the 
other hand, if x = 1, then x n 1. It follows that g n g pointwise on 
[0, 1], where 


f 0 for 0 < x < 1 
| 1 for x = 1 . 


(iii) Consider h n (x) = x 1+2ri - 1 on the set [—1,1] (Fig. 6.3). For a fixed x E 
— 1,1] we have 

lim h n (x) = x lim x 2n ~ 1 = \x . 

n^oo n^oo 
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Figure 6.2: g(x) = lim n ^oo x n IS NOT CONTINUOUS ON [0,1]. 



Figure 6.3: 


h 


n 


X 


ON 


-1,1]; LIMIT IS NOT DIFFERENTIABLE. 


Examples 6.2.2 (ii) and (iii) are our first indication that there is some difficult 
work ahead of us. The central theme of this chapter is analyzing which prop- 
erties the limit function inherits from the approximating sequence. In Example 
6.2.2 (iii) we have a sequence of differentiable functions converging pointwise to 
a limit that is not differentiable at the origin. In Example 6.2.2 (ii), we see an 
even more fundamental problem of a sequence of continuous functions converg- 
ing to a limit that is not continuous. 


Continuity of the Limit Function 

With Example 6.2.2 (ii) firmly in mind, we begin this discussion with a doomed 
attempt to prove that the pointwise limit of continuous functions is continuous. 
Upon discovering the problem in the argument, we will be in a better position 
to understand the need for a stronger notion of convergence for sequences of 
functions. 
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Assume (f n ) is a sequence of continuous functions on a set A C R, and 
assume (f n ) converges pointwise to a limit /. To argue that / is continuous, fix 
a point c £ A, and let e > 0. We need to find a S > 0 such that 


x — c 


< 5 


implies | f(x) — /(c) | < e. 


By the triangle inequality, 


fix) — /(c) I = \f(x) — f n {x) + f n (x) — f n (c) + f n {c) — /(c) | 

< | f(x) ~ fnix) | + | fn{x) ~ /„(c)| + |/„(c) - /(c) 


Our first, optimistic impression is that each term in the sum on the right-hand 
side can be made small — the first and third by the fact that f n /, and the 
middle term by the continuity of f n . In order to use the continuity of / n , we 
must first establish which particular f n we are talking about. Because c E A is 
fixed, choose TV £ N so that 


/iv(c) - /(c) 



Now that N is chosen, the continuity of fjy implies that there exists a S > 0 
such that 

I fNix) ~ /jv(c) | < - 

for all x satisfying \x — c\ < S. 

But here is the problem. We also need 


\fN{x) — f(x ) | < - for all x satisfying \x — c\ < S. 


The values of x depend on S, which depends on the choice of N. Thus, we cannot 
go back and simply choose a different N. More to the point, the variable x is 
not fixed the way c is in this discussion but represents any point in the interval 
(c— 5, c+5). Pointwise convergence implies that we can make | f n (x)—f(x ) | < e/3 
for large enough values of n, but the value of n depends on the point x. It is 
possible that different values for x will result in the need for different — larger — 
choices for n. This phenomenon is apparent in Example 6.2.2 (ii). To achieve 
the inequality 

\g n (l/2)-g(l/2)\ < I 

we need n > 2, whereas 

M9/10)- 5(9/10)1 < t 

is true only after n > 11. 


Uniform Convergence 

To resolve this dilemma, we define a new, stronger notion of convergence of 
functions. 
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Definition 6.2.3 (Uniform Convergence). Let (f n ) be a sequence of func- 
tions defined on a set A C R. Then, (f n ) converges uniformly on A to a limit 
function / defined on A if, for every e > 0, there exists an TV £ N such that 
I fn (x) — f(x) | < e whenever n > N and x £ A. 


To emphasize the difference between uniform convergence and pointwise con- 
vergence, we restate Definition 6.2.1, being more explicit about the relationship 
between e, IV, and x. In particular, notice where the domain point x is refer- 
enced in each definition and consequently how the choice of N then does or does 
not depend on this value. 


Definition 6. 2. IB. Let f n be a sequence of functions defined on a set A C R. 
Then, (/ n ) converges pointwise on A to a limit / defined on A if, for every 
e > 0 and x £ A, there exists an TV £ N (perhaps dependent on x) such that 
| fn (x) — f(x) | < e whenever n > N. 


The use of the adverb uniformly here should be reminiscent of its use in 
the phrase “uniformly continuous” from Chapter 4. In both cases, the term 
“uniformly” is employed to express the fact that the response (5 or N) to a 
prescribed e can be chosen to work simultaneously for all values of x in the 
relevant domain. 


Example 6.2.4. (i) Let 

5n(a;) = n(l + x 2 ) ' 


For any fixed x £ R, we can see that lim g n (x) = 0 so that g(x) = 0 is the 
pointwise limit of the sequence (g n ) on R. Is this convergence uniform? 
The observation that 1/(1 + x 2 ) < 1 for all x £ R implies that 


9n{x) - g{x) 


1 

n( 1 + x 2 ) 




Thus, given e > 0, we can choose N > 1/e (which does not depend on x), 
and it follows that 


n > N implies 


9n(x) - g(x) 


< e 


for all x £ R. By Definition 6.2.3, g n -£ 0 uniformly on R. 


(ii) Look back at Example 6.2.2 (i), where we saw that f n (x) = ( x 2 + nx)/n 
converges pointwise on R to f(x) = x. On R, the convergence is not 
uniform. To see this write 


fn(x) ~ f(x) 


x 2 + nx 

X 

n 



n 


and notice that in order to force | f n (x) — f(x) \ < e, we are going to have 
to choose 


N > 



e 
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Although this is possible to do for each xgR, there is no way to choose 
a single value of N that will work for all values of x at the same time. 


On the other hand, we can show that f n f uniformly on the set [—6, b\. 
By restricting our attention to a bounded interval, we may now assert that 


x 2 b 2 
— < — . 
n n 

Given e > 0, then, we can choose 


N > 



e 


independently of x G 



Graphically speaking, the uniform convergence of f n to a limit / on a set 
A can be visualized by constructing a band of radius =be around the limit func- 
tion /. If f n f uniformly, then there exists a point in the sequence after which 
each f n is completely contained in this e-strip (Fig. 6.4). This image should be 
compared with the graphs in Figures 6. 1-6.2 from Example 6.2.2 and the one 
in Figure 6.5. 


Cauchy Criterion 

Recall that the Cauchy Criterion for convergent sequences of real numbers was 
an equivalent characterization of convergence which, unlike the definition, did 
not make explicit mention of the limit. The usefulness of the Cauchy Criterion 
suggests the need for an analogous characterization of uniformly convergent 
sequences of functions. As with all statements about uniformity, pay special 
attention to the relationship between the response variable (N G N) and the 
domain variable (x G A). 

Theorem 6.2.5 (Cauchy Criterion for Uniform Convergence). A se- 
quence of functions (/ n ) defined on a set A C R converges uniformly on A if 
and only if for every e > 0 there exists an N G N such that \ f n {x) — f m (x)\ < e 
whenever m,n > N and x G A. 

Proof. Exercise 6.2.5. □ 


Continuity Revisited 

The stronger assumption of uniform convergence is precisely what is required to 
remove the flaws from our attempted proof that the limit of continuous functions 
is continuous. 

Theorem 6.2.6 (Continuous Limit Theorem). Let (/ n ) be a sequence of 
functions defined on A C R that converges uniformly on A to a function /. If 
each f n is continuous at c G A, then f is continuous at c. 
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Figure 6.5: g n — » g pointwise, but not uniformly. 


Proof. Fix c E A and let e > 0. Choose N so that 


\/n{x) - f(x ) I < - 

for all x G A. Because /n is continuous, there exists a S > 0 for which 


I In(x) - f N (c) | < - 


is true whenever 


x — c 


< S. But this implies 


|/(:r) - /(c) 


< 

< 


I f(x) ~ !n(x) + f N (x) - f N (c ) + f N (c) - f(c) I 
I f{x) - f N (x ) I + I f N {x) - f N {c ) I + \f N (c) - /(c) 


e e e 

— + — + — = 6 . 

3 3 3 


Thus, / is continuous at c E A. 


□ 
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Exercises 


Exercise 6.2.1. Let 


fn(x) = 


nx 


1 + nx 2 


(a) Find the pointwise limit of (/ n ) for all x G (0, oo). 

(b) Is the convergence uniform on (0, oo)? 

(c) Is the convergence uniform on (0, 1)? 

(d) Is the convergence uniform on (1, oo)? 

Exercise 6.2.2. (a) Define a sequence of functions on R by 


fn 0 ) = 


I if t = 1 — — — 

-L ll ^ 5 2 5 3 ’ • ’ • 5 n 

0 otherwise 


n 


and let / be the pointwise limit of f n . 

Is each f n continuous at zero? Does f n 
continuous at zero? 


/ uniformly on R? Is / 


(b) Repeat this exercise using the sequence of functions 


9n(x) = 


x if X = 1, i, 4, 

0 otherwise. 


l 

5 n 


(c) Repeat the exercise once more with the sequence 


h n (x) 



if x = - 

n i i 

if r = 1 — — 
otherwise. 


• ’ n - 1 


In each case, explain how the results are consistent with the content of 
the Continuous Limit Theorem (Theorem 6.2.6). 

Exercise 6.2.3. For each n G N and x G [0, oo), let 


g n (x) = 


X 


and h n (x) = 


1 if x > 1/n 

nx if 0 < x < 1/n. 


1 x n \ nx if 0 < x < ! 

Answer the following questions for the sequences (g n ) and (h n ) 

(a) Find the pointwise limit on [0, oo). 

(b) Explain how we know that the convergence cannot be uniform on [0, oo). 

(c) Choose a smaller set over which the convergence is uniform and supply an 
argument to show that this is indeed the case. 


6.2. Uniform Convergence of a Sequence of Functions 


181 


Exercise 6.2.4. Review Exercise 5.2.8 which includes the definition for a 
uniformly differentiable function. Use the results discussed in Section 6.2 to 
show that if / is uniformly differentiable, then f is continuous. 

Exercise 6.2.5. Using the Cauchy Criterion for convergent sequences of real 
numbers (Theorem 2.6.4), supply a proof for Theorem 6.2.5. (First, define a 
candidate for f[x\ and then argue that f n / uniformly.) 

Exercise 6.2.6. Assume f n f on a set A. Theorem 6.2.6 is an example 
of a typical type of question which asks whether a trait possessed by each f n 
is inherited by the limit function. Provide an example to show that all of 
the following propositions are false if the convergence is only assumed to be 
pointwise on A. Then go back and decide which are true under the stronger 
hypothesis of uniform convergence. 

(a) If each f n is uniformly continuous, then / is uniformly continuous. 

(b) If each f n is bounded, then / is bounded. 

(c) If each f n has a finite number of discontinuities, then / has a finite number 
of discontinuities. 

(d) If each f n has fewer than M discontinuities (where Me N is fixed), then 
/ has fewer than M discontinuities. 

(e) If each f n has at most a countable number of discontinuities, then / has 
at most a countable number of discontinuities. 


Exercise 6.2.7. Let / be uniformly continuous on all of R, and define a seq- 
uence of functions by f n (x) = “)• Show that f n ~tf uniformly. Give an 

example to show that this proposition fails if / is only assumed to be continuous 
and not uniformly continuous on R. 

Exercise 6.2.8. Let (g n ) be a sequence of continuous functions that converges 
uniformly to g on a compact set K. If g(pc) ^ 0 on iL, show (1 / g n ) converges 
uniformly on K to 1/g. 

Exercise 6.2.9. Assume (/ n ) and (g n ) are uniformly convergent sequences of 
functions. 


(a) Show that (/ n + g n ) is a uniformly convergent sequence of functions. 

(b) Give an example to show that the product ( f n 9n ) may not converge uni- 
formly. 



Prove that if there exists an M > 0 such that \f n \ < M and \g n \ < M for 
all n G N, then ( f n 9n ) does converge uniformly. 


Exercise 6.2.10. This exercise and the next explore partial converses of the 
Continuous Limit Theorem (Theorem 6.2.6). Assume f n ~tf pointwise on [a, b] 
and the limit function / is continuous on [a, b\. If each f n is increasing (but not 
necessarily continuous), show f n ~>f uniformly. 
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Exercise 6.2.11 (Dini’s Theorem). Assume f n ~^f pointwise on a compact 
set K and assume that for each x G K the sequence f n (x) is increasing. Follow 
these steps to show that if f n and / are continuous on K , then the convergence 
is uniform. 

(a) Set g n — f — fn and translate the preceding hypothesis into statements 
about the sequence (g n ). 

(b) Let e > 0 be arbitrary, and define K n = {x G K : g n (x) > e}. Argue that 
K\ D K 2 D K% D • • • , and use this observation to finish the argument. 

Exercise 6.2.12 (Cantor Function). Review the construction of the Cantor 
set C C [0, 1] from Section 3.1. This exercise makes use of results and notation 
from this discussion. 

(a) Define fo(x) = x for all x G [0, 1]. Now, let 

( (3/2):r for 0 < x < 1/3 

f 1 (x)=< 1/2 for 1/3 < x < 2/3 

[ (3/2)£ — 1/2 for 2/3 < x < 1. 

Sketch /o and /1 over [0, 1] and observe that /1 is continuous, increasing, 

and constant on the middle third (1/3, 2/3) = [0, l]\Ci. 

(b) Construct by imitating this process of flattening out the middle third 

of each nonconstant segment of / 1 . Specifically, let 

( (1/2 ) /1 (32;) for 0 < x < 1/3 

f 2 ( 2 ;) = l fi(x) for 1/3 < x < 2/3 

[ (l/2)/i(3a; - 2) + 1/2 for 2/3 < x < 1. 

If we continue this process, show that the resulting sequence (/ n ) converges 
uniformly on [0, 1]. 

(c) Let / = lim f n . Prove that / is a continuous, increasing function on [0, 1] 
with /( 0) = 0 and /( 1) = 1 that satisfies f'(x) = 0 for all x in the open 
set [0, 1]\C. Recall that the “length” of the Cantor set C is 0. Somehow, 
/ manages to increase from 0 to 1 while remaining constant on a set of 
“length 1.” 

Exercise 6.2.13. Recall that the Bolzano- Weierstrass Theorem (Theorem 
2.5.5) states that every bounded sequence of real numbers has a convergent 
subsequence. An analogous statement for bounded sequences of functions is not 
true in general, but under stronger hypotheses several different conclusions are 
possible. One avenue is to assume the common domain for all of the functions 
in the sequence is countable. (Another is explored in the next two exercises.) 

Let A = {xi, X 2 , X 3 , . . .} be a countable set. For each n G N, let f n be 
defined on A and assume there exists an M > 0 such that \f n (x)\ < M for all 
n G N and x G A. Follow these steps to show that there exists a subsequence 
of (/ n ) that converges pointwise on A. 
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(a) Why does the sequence of real numbers f n (x i) necessarily contain a con- 
vergent subsequence (/ nfe )? To indicate that the subsequence of functions 
(fn k ) is generated by considering the values of the functions at aq, we will 
use the notation f nk = f ljk . 

(b) Now, explain why the sequence fi } k( x 2 ) contains a convergent subsequence. 

(c) Carefully construct a nested family of subsequences (/ m ,/e), and show how 
this can be used to produce a single subsequence of (/ n ) that converges 
at every point of A. 


Exercise 6.2.14. A sequence of functions (/ n ) defined on a set E C R is called 
eqmcontmuous if for every e 0 there exists a S 0 such that | f ^ (x )-fn(y) I < e 


for all n £ N and 


x 


V 


< S in E. 


(a) What is the difference between saying that a sequence of functions (/ n ) is 
equicontinuous and just asserting that each f n in the sequence is individ- 
ually uniformly continuous? 

(b) Give a qualitative explanation for why the sequence g n (x) = x n is not 
equicontinuous on [0, 1]. Is each g n uniformly continuous on [0, 1]? 

Exercise 6.2.15 (Arzela— Ascoli Theorem). For each n £ N, let f n be a 

function defined on [0, 1]. If (/ n ) is bounded on [0, 1] — that is, there exists an 
M 0 such that ^ Ad for all n £ IN and x £ [0, 1] and if the collection 

of functions (/ n ) is equicontinuous (Exercise 6.2.14), follow these steps to show 
that (/ n ) contains a uniformly convergent subsequence. 

(a) Use Exercise 6.2.13 to produce a subsequence (/ nfe ) that converges at every 
rational point in [0, 1]. To simplify the notation, set gk = f nk • It remains 
to show that (g k ) converges uniformly on all of [0, 1]. 

(b) Let e '^ > 0. By equicontmuity, there exists a S 0 such that 

1 9 k(x) -9k(y ) I < | 

for all \x — y\ < <5 and k £ N. Using this (5, let ri, 7*2, . . . , r m be a 
finite collection of rational points with the property that the union of 
the neighborhoods V$(ri) contains [0,1]. 

Explain why there must exist an N £ N such that 

9s(ri) - 9t(ri)\ < | 

for all s,t > N and in the finite subset of [0, 1] just described. Why 
does having the set {ri, 7 * 2 , . . . , r m } be finite matter? 

(c) Finish the argument by showing that, for an arbitrary x £ [0, 1], 


9 s{x) - g t {x ) | < e 


for all 5, t > N. 
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6.3 Uniform Convergence and Differentiation 

Example 6.2.2 (iii) imposes some significant restrictions on what we might hope 
to be true regarding differentiation and uniform convergence. If h n h uni- 
formly and each h n is differentiable, we should not anticipate that h' n h! 
because in this example h! (x) does not even exist at x = 0. There are also 
examples (see Exercise 6.3.4) where f n f uniformly with (/ n ) and / all 
differentiable, but the sequence (//) diverges at every point of the domain. 

The key assumption necessary to be able to prove any facts about the 
derivative of the limit function is that the sequence of derivatives be uniformly 
convergent. This may sound as though we are assuming what it is we would 
like to prove, and there is some validity to this complaint. The more hypotheses 
a proposition has, the more difficult it is to apply. The content of the next 
theorem is that if we are given a pointwise convergent sequence of differentiable 
functions, and if we know that the sequence of derivatives converges uniformly 
to something , then the limit of the derivatives is indeed the derivative of the 
limit. 


Theorem 6.3.1 (Differentiable Limit Theorem). Let f n — > f pointwise 
on the closed interval [a, b\, and assume that each f n is differentiable. If (//) 
converges uniformly on [a, b } to a function g, then the function f is differentiable 
and f = g. 


Proof. Fix c G [a, b] and let e > 0. We want to argue that /'(c) exists and equals 
g(c). Because f is defined by the limit 



Bm 

x^rc X ~ C 


our task is to produce a S > 0 so that 


f(x) - /(c) 
x — c 


9(c) 


< e 


whenever 0 < \x — c\ < 6. 

To motivate the strategy of the proof, observe that for all x c and all 
n G N, the triangle inequality implies 


f(x) - f(c ) 


x — c 


-9(c) 


< 


f(x) - f (c) fn(x) - fn (c) 


+ 


x — c x — c 

fn( X) ~ fn(c) 


x — c 


- fn(c) 


+ I fn(c)-g(c) 


Our intent is to first find an f n that forces the first and third terms on the 
right-hand side to be less than e/3. Once we establish which f n we want, we 
can then use the differentiability of f n to produce a S that makes the middle 
term less than e/3 for all x satisfying 0 < \x — c\ < S. 
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Let’s start by choosing an N\ such that 
(!) l/m(c) -5(c) I < | 

for all m> N\. We now invoke the uniform convergence of (//) to assert (via 
Theorem 6.2.5) that there exists an 7V 2 such that ra, n > 7V 2 implies 

I f m {x) - f' n (x ) I < | for all X € [a, 6]. 

Set IV = max{7Vi, 7V 2 }. 

The function is differentiable at c, and so there exists a 8 > 0 for which 


( 2 ) 


/n(x) - /iv(c) 


x — c 


/at( c ) 


e 

3 


whenever 0 < \x — c\ < S. This is our sought after S, but it takes some effort to 
show that it has the desired property. 

Fix an x satisfying 0 < \x — c\ < S, let m > IV, and apply the Mean Value 
Theorem to f m — f n on the interval [c, x ] , (if x < c the argument is the same.) 
By MVT, there exists an a E (c, x) such that 


/m(«) - /tO) = 


(/mW - /n(x)) ~ ~ /jv(c)) 


x — c 


Recall that our choice of N implies 


1/mW /n(«)I ^ o’ 


and so it follows that 


fm(x)-f m (c ) f N (x)-f N (c) 


x — c 


x — c 


< 


Because fm—tf we can take the limit as m oo, and the Order Limit Theorem 
(Theorem 2.3.4) asserts that 


(3) 


f(x)-f(c) f N (x) — f N (c) 


x — c 


x — c 


e 

< - 
~ 3 


Finally, the inequalities in (1), (2), and (3), together imply that for x satisfying 
0 < I x — c < S, 


f(x) - /(c) 


x — c 


9(c) 


< 


< 


f(x) - /(c) f N (x) - f N (c) 


+ 


x — c x — c 

In(x) - /jv(c) 


x — c 


/a^(c) 


+ I/at(c) -9(c) 


e e e 

— T — T — = 6. 

3 3 3 


□ 
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The hypothesis in the Differentiable Limit Theorem is unnecessarily strong. 
We actually do not need to assume that f n (x) f(x) at each point in the 
domain because the assumption that the sequence of derivatives (ff) converges 
uniformly is nearly strong enough to prove that (/ n ) converges, uniformly in 
fact. Two functions with the same derivative may differ by a constant, so we 
must assume that there is at least one point xq where f n (x o) f(x o). 


Theorem 6.3.2. Let (f n ) be a sequence of differentiable functions defined on 
the closed interval [a, b], and assume (ff) converges uniformly on [a, b\. If there 
exists a point xq E [a, b] where f n (x o) is convergent, then (/ n ) converges uni- 
formly on [a,b\. 


Proof. Exercise 6.3.7. 


□ 


Combining the last two results produces a stronger version of Theorem 6.3.1. 


Theorem 6.3.3. Let (f n ) be a sequence of differentiable functions defined on 
the closed interval [a, b\, and assume (ff) converges uniformly to a function g on 
a, b\. If there exists a point xo E [a, b] for which f n (x o) is convergent, then (/ n ) 
converges uniformly. Moreover, the limit function f = lim f n is differentiable 
and satisfies f = g. 


Exercises 

Exercise 6.3.1. Consider the sequence of functions defined by 

rp H 

9n(x) = — . 

n 

(a) Show (g n ) converges uniformly on [0, 1] and find g = lim g n . Show that g 
is differentiable and compute g'(x) for all x E [0,1]. 

(b) Now, show that (g' n ) converges on [0, 1]. Is the convergence uniform? Set 
h = lircig' n and compare h and g' . Are they the same? 

Exercise 6.3.2. Consider the sequence of functions 



(a) Compute the pointwise limit of (h n ) and then prove that the convergence 
is uniform on R. 

(b) Note that each h n is differentiable. Show g(x) = lim h' n (x) exists for all 
x , and explain how we can be certain that the convergence is not uniform 
on any neighborhood of zero. 

Exercise 6.3.3. Consider the sequence of functions 

fn(x) = — 2’ 

1 + nx z 
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(a) Find the points on R where each f n (x) attains its maximum and minimum 
value. Use this to prove (/ n ) converges uniformly on R. What is the limit 
function? 


(b) Let / = lim f n . Compute f' n (x) and find all the values of x for which 
f(x) = lim f' n (x). 

Exercise 6.3.4. Let 

sin (nx) 


Show that h n 0 uniformly on R but that the sequence of derivatives (h' n ) 
diverges for every x E R. 




Exercise 6.3.5. Let 

nx + x 2 
2 n ’ 

and set g(x) = lim g n (x). Show that g is differentiable in two ways: 

(a) Compute g(pc) by algebraically taking the limit as n —> oo and then 
find g'(x). 



(b) Compute g' n (x) for each n E N and show that the sequence of derivatives 
(g' n ) converges uniformly on every interval [— M, M}. Use Theorem 6.3.3 
to conclude g'(x ) = lim g' n (x). 


(c) Repeat parts (a) and (b) for the sequence f n (x) = ( nx 2 + l)/(2n + x). 


Exercise 6.3.6. Provide an example or explain why the request is impossible. 
Let’s take the domain of the functions to be all of R. 


(a) A sequence (/ n ) of nowhere differentiable functions with f n ~tf uniformly 
and / everywhere differentiable. 


(b) A sequence (/ n ) of differentiable functions such that (/^) converges uni- 
formly but the original sequence (/ n ) does not converge for any x E R. 

(c) A sequence (/ n ) of differentiable functions such that both (/ n ) and (f' n ) 
converge uniformly but / = lim f n is not differentiable at some point. 


Exercise 6.3.7. Use the Mean Value Theorem to supply a proof for Theo- 
rem 6.3.2. To get started, observe that the triangle inequality implies that, for 
any x E [a, b\ and m, n E N, 


\fn{x) ~ fm{x) | < \(f n (x) 


fm (x)) ~ (f„(x 0 ) fm (a?o)) I + \fn(x 0 ) 


fm (X 0 ) 
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6.4 Series of Functions 

Definition 6.4.1. For each n E N, let f n and / be functions defined on a set 
A C R. The infinite series 

oo 

T /"W = AW + M x ) + M x ) h — 

n=l 

converges pointwise on A to /(#) if the sequence sj. z (x) of partial sums defined by 

Sk{x) = h{x) + f 2 (x) H h fk(x) 

converges pointwise to f(x). The series converges uniformly on A to / if the 
sequence Sk(x) converges uniformly on A to f(x). 

In either case, we write / = E~=i fn or f(x) = J2n= l fn(x), always being 
explicit about the type of convergence involved. 

If we have a series Y^=i fn where the functions f n are continuous, then 
the Algebraic Continuity Theorem (Theorem 4.3.4) guarantees that the partial 
sums — because they are finite sums — will be continuous as well. A correspond- 
ing observation is true if we are dealing with differentiable functions. As a 
consequence, we can immediately translate the results for sequences in the pre- 
vious sections into statements about the behavior of infinite series of functions. 

Theorem 6.4.2 (Term-by-term Continuity Theorem). Let f n be continu- 
ous functions defined on a set A C R, and assume fn converges uniformly 

on A to a function f . Then, f is continuous on A. 

Proof. Apply the Continuous Limit Theorem (Theorem 6.2.6) to the partial 
sums s k = fi + h H H fk- □ 

Theorem 6.4.3 (Term-by-term Differentiability Theorem). Let f n be 

differentiable functions defined on an interval A, and assume fn ( x ) con ~ 

verges uniformly to a limit g(x) on A. If there exists a point xq e [a, b] where 
fn(x o) converges, then the series fn(x) converges uniformly to a 

differentiable function f(x) satisfying f'(x) = g(x) on A. In other words, 

oo oo 

f( x ) = T /« ( x ) and f'( x ) = T f^( x )- 

n — 1 n—1 

Proof. Apply the stronger form of the Differentiable Limit Theorem (Theorem 
6.3.3) to the partial sums Sk = /i + /2 + * * * + /&• Observe that Theorem 5.2.4 
implies that s' k = f[ + f 2 H + ff . □ 

In the vocabulary of infinite series, the Cauchy Criterion takes the following 
form. 
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Theorem 6.4.4 (Cauchy Criterion for Uniform Convergence of Series). 

A series fn converges uniformly on A C R if and only if for every e > 0 

there exists an N E N such that 


\fm+l{ x ) + /ra+2 {%) + fm-\-3 ( x ) + ' * ' + fn{ x )\ < 6 

whenever n > m > N and x £ A. 

The benefits of uniform convergence over pointwise convergence suggest the 
need for some ways of determining when a series converges uniformly. The fol- 
lowing corollary to the Cauchy Criterion is the most common such tool. In 
particular, it will be quite useful in our upcoming investigations of power series. 

Corollary 6.4.5 (Weierstrass M-Test). For each n G N, let f n be a function 
defined on a set A C R, and let M n > 0 be a real number satisfying 


fn(x) | < M n 


for all x G A. If VGi Vi converges, then f n converges uniformly on A. 

Proof. Exercise 6.4.1. □ 


Exercises 

Exercise 6.4.1. Supply the details for the proof of the Weierstrass M-Test 
(Corollary 6.4.5). 

Exercise 6.4.2. Decide whether each proposition is true or false, providing a 
short justification or counterexample as appropriate. 

(a) If g n converges uniformly, then (g n ) converges uniformly to zero. 

(b) If 0 < f n (x) < g n (x) and J2n=i9n converges uniformly, then fn 

converges uniformly. 

(c) If 1 fn converges uniformly on A , then there exist constants M n such 

that | fn ( x ) | < M n for all x G A and converges. 

Exercise 6.4.3. (a) Show that 


9(x) = Yi 
n = 0 


cos(2 n x) 

2 n 


is continuous on all of R. 

(b) The function g was cited in Section 5.4 as an example of a continuous 
nowhere differentiable function. What happens if we try to use Theorem 
6.4.3 to explore whether g is differentiable? 
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Exercise 6.4.4. Define 


oo 


9 (x) = Y 


X 


2 n 


n — 0 


(1 + X 2n ) 


Find the values of x where the series converges and show that we get a continuous 
function on this set. 

Exercise 6.4.5. (a) Prove that 


oo 


H x ) = Y 


X 


n 


2 4 4 

rp rp ^ rp * 1 

X + - + - + - + 


n — 1 


n* 


4 


9 16 


is continuous on [—1,1 . 


(b) The series 


oo 


f ( x ) = Y 


n — 1 


r> 2 4 4 

rp ' u rp ^ rp ' rp- 1 - 

th th th th 

= x “I 0 I ^ I h 
n 2 3 4 


converges for every x in the half-open interval [—1, 1) but does not converge 
when x = 1. For a fixed Xq G (—1,1), explain how we can still use the 
Weierstrass M-Test to prove that / is continuous at xq. 


Exercise 6.4.6. Let 


f(x) = - - | - 

x x + 1 x + 2 x + 3 ' x + 4 


1 1 

+ 


Show / is defined for all x > 0. Is / continuous on (0, oo)? How about 
differentiable? 


Exercise 6.4.7. Let 


oo 


f( x ) = Y 


k = 1 


sin (kx) 

k 3 


(a) Show that f(x) is differentiable and that the derivative f'(x ) is continuous. 

(b) Can we determine if / is twice-differentiable? 


Exercise 6.4.8. Consider the function 


oo 


/<» = Y 


k = 1 


sin (x/k) 

k 


Where is / defined? Continuous? Differentiable? Twice-differentiable? 


oo 


H x ) = Y 


l 


n = 1 


x 2 + n 2 


Exercise 6.4.9. Let 
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(a) Show that h is a continuous function defined on all of R. 

(b) Is h differentiable? If so, is the derivative function h' continuous? 

Exercise 6.4.10. Let {ri,7*2,r3, . . .} be an enumeration of the set of rational 
numbers. For each r n E Q, define 

u ( x ) = f V 2 " f or x>r n 
fo | 0 for x < r n . 


Now, let h(x) = J2^=i u n(x). Prove that h is a monotone function defined on 
all of R that is continuous at every irrational point. 


6.5 Power Series 


It is time to put some mathematical teeth into our understanding of functions 
expressed in the form of a power series; that is, functions of the form 


oo 

f(x ) = a n x n = ao + a\x + a 2 X 2 + a^x 3 + • • • . 

n = 0 


The first order of business is to determine the points x E R for which the 
resulting series on the right-hand side converges. This set certainly contains 
x = 0, and, as the next result demonstrates, it takes a very predictable form. 


Theorem 6.5.1. If a power series 


OO ji 

a y* i 1 


converges at some point xq E R, 


then it converges absolutely for any x satisfying \x\ < \xq 

Proof. If a n x o converges, then the sequence of terms (a n x q ) is bounded. 

(In fact, it converges to 0.) Let M > 0 satisfy |a n Xo| < M for all n E N. If 


x E R satisfies 


x 


< 


x 0 


then 


77 

a n x 


a„x o 


X 

n 

X 

— 


< M 




Xo 


Xo 


n 


But notice that 


E M 

n — 0 


X 


n 


Xo 


is a geometric series with ratio |x/xo| < 1 and so converges. By the Comparison 
Test > E^=o a n x n converges absolutely. □ 


The main implication of Theorem 6.5.1 is that the set of points for which a 
given power series converges must necessarily be {0}, R, or a bounded interval 
centered around x = 0. Because of the strict inequality in Theorem 6.5.1, there 
is some ambiguity about the endpoints of the interval, and it is possible that 
the set of convergent points may be of the form (—!?,!?), [— R, R), (— R, R], or 
-R, R] . 
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The value of R is referred to as the radius of convergence of a power series, 
and it is customary to assign R the value 0 or oo to represent the set {0} 
or R, respectively. Some of the standard devices for computing the radius of 
convergence for a power series are explored in the exercises. Of more interest 
to us here is the investigation of the properties of functions defined in this way. 
Are they continuous? Are they differentiable? If so, can we differentiate the 
series term-by-term? What happens at the endpoints? 


Establishing Uniform Convergence 

The positive answers to the preceding questions, and the usefulness of power 
series in general, are largely due to the fact that they converge uniformly on 
compact sets contained in their domain of convergent points. As we are about to 
see, a complete proof of this fact requires a fairly delicate argument attributed 
to the Norwegian mathematician Niels Henrik Abel. A significant amount of 
progress, however, can be made with the Weierstrass M-Test (Corollary 6.4.5). 


Theorem 6.5.2. If a power series CL n % n converges absolutely at a point 

xq, then it converges uniformly on the closed interval [— c, c\, where c= |xq|. 


Proof This proof requires a straightforward application of the Weierstrass 
M-Test. The details are requested in Exercise 6.5.3. □ 


For many applications, Theorem 6.5.2 is good enough. For instance, be- 
cause any x E (—R,R) is contained in the interior of a closed interval [— c, c] C 
(— i?, i?), it now follows that a power series that converges on an open interval 
is necessarily continuous on this interval. 

But what happens if we know that a series converges at an endpoint of 
its interval of convergence? Does the good behavior of the series on (—R,R) 
necessarily extend to the endpoint x = R? If the convergence of the series at 
x = R is absolute convergence, then we can again rely on Theorem 6.5.2 to 
conclude that the series converges uniformly on the set [— i?, R\. The remaining 
interesting open question is what happens if a series converges conditionally 
at a point x = R. We may still use Theorem 6.5.1 to conclude that we have 
pointwise convergence on the interval (— i?, i?], but more work is needed to 
establish uniform convergence on compact sets containing x = R. 


Abel’s Theorem 


We should remark that if the power series g(x) = a n% n converges con- 

ditionally at x = R, then it is possible for it to diverge when x = — R. The 
series 


E 


(■ -l) n x n 
n 


with R = 1 is an example. To keep our attention fixed on the convergent 
endpoint, we will prove uniform convergence on the set [0, ii]. 
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The first step in the argument is an estimate that should be compared to 
Abel’s Test for convergence of series, developed back in Chapter 2 (Exercise 
2.7.13). 

Lemma 6.5.3 (Abel’s Lemma). Let b n satisfy b\ > 62 > 63 > • • • > 0, and 
let J2n= 1 a n be a series for which the partial sums are bounded. In other words , 
assume there exists A > 0 such that 


a\ T Q-2 T * * * T a n 


< A 


for all n £ N . Then , for all n E N, 


aibi + Q-2^2 T 0-363 + • • • + a n b n 


< Ab\. 


Proof. Let s n = a\ + <22 + • • • + a n . Using the summation- by-parts formula 
derived in Exercise 2.7.12, we can write 


n 

k=l 


n 

^n^n+1 T ^ ^ S k ipk bk-\- 1) 

k = 1 


n 

< A6 n+ i + A(pk — 6fe+i) 

k = 1 

= Ab n+ i + (Abi — Abn+i) = Ab\. 


□ 


It is worth observing that if A were an upper bound on the partial sums 
of ^2 \ a n\ (note the absolute value bars), then the proof of Lemma 6.5.3 would 
be a simple exercise in the triangle inequality. The point of the matter is that 
because we are only assuming conditional convergence, the triangle inequality 
is not going to be of any use in proving Abel’s Theorem, but we are now in 
possession of an inequality that we can use in its place. 

Theorem 6.5.4 (Abel’s Theorem). Let g(x) = J2n = 0 a n x n be a power series 
that converges at the point x = R > 0. Then the series converges uniformly on 
the interval [0, R\ . A similar result holds if the series converges at x = —R. 

Proof. To set the stage for an application of Lemma 6.5.3, we first write 

00 00 

d( X ) = T “n 1 " = 'T( a nR n ) (^) • 

n — 0 n = 0 

Let e > 0. By the Cauchy Criterion for Uniform Convergence of Series (Theorem 
6.4.4), we will be done if we can produce an N such that n > m > N implies 



( rp \ 771 + 1 

-) + (a m+2 R m+2 ) 



m+2 




< e. 
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Because we are assuming that a nR n converges, the Cauchy Criterion for 

convergent series of real numbers guarantees that there exists an N such that 


a m+ iR 171 ^ 1 + a m+ 2^ m+2 + • • • + a n R n 



whenever n > m > N. But now, for any fixed m £ N, we can apply Abel’s 
Lemma (Lemma 6.5.3) to the sequences obtained by omitting the first m terms. 
Using e/2 as a bound on the partial sums of Y^jLi and observing that 

(. x/ R ) m +-7 i s monotone decreasing, an application of Abel’s Lemma to equation 
(1) yields 


( 7 * \ m+1 

-) + (a m+2 R m+2 ) 



m+2 




*2 (r 


m+1 

< e. 


□ 


The Success of Power Series 


An economical way to summarize the conclusions of Theorem 6.5.2 and Abel’s 
Theorem is with the following statement. 

Theorem 6.5.5. If a power series converges pointwise on the set A C R, then 
it converges uniformly on any compact set K C A. 


Proof. A compact set contains both a maximum x\ and a minimum xq, which by 
hypothesis must be in A. Abel’s Theorem implies the series converges uniformly 


on the interval 


x o i X 1 


and thus also on K. 


□ 


This fact leads to the desirable conclusion that a power series is continuous 
at every point at which it converges. To make an argument for differentia- 
bility, we would like to appeal to Theorem 6.4.3; however, this result has a 
slightly more involved set of hypotheses. In order to conclude that a power 
series a n xTl is differentiable, and that term-by-term differentiation is al- 

lowed, we need to know beforehand that the differentiated series Y^Li 
converges uniformly. 

Theorem 6.5.6. 4/ a n x n converges for all x £ (—R,R), then the differ- 
entiated series 1 na n xn ~ 1 converges at each x £ (—R,R) as well. Conse- 
quently, the convergence is uniform on compact sets contained in (—R,R). 

Proof. Exercise 6.5.5. □ 


We should point out that it is possible for a series to converge at an end- 
point x = R but for the differentiated series to diverge at this point. The 
series xU / n h as this property when x = —1. On the other hand, if the 

differentiated series does converge at the point x = R, then Abel’s Theorem 
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applies and the convergence of the differentiated series is uniform on compact 
sets that contain R. 

With all the pieces in place, we summarize the impressive conclusions of this 
section. 

Theorem 6.5.7. Assume 

oo 

f(x) = ^2 a n x n 

n = 0 

converges on an interval A C R. The function f is continuous on A and 
differentiable on any open interval (—R,R) C A. The derivative is given by 

oo 

f'( x ) = T, na n x n ~ l . 

n = 1 

Moreover , f is infinitely differentiable on (—R,R), and the successive derivatives 
can be obtained via term-by-term differentiation of the appropriate series. 

Proof. The details for why / is continuous have been discussed. Theorem 6.5.6 
justifies the application of the Term-by-term Differentiability Theorem (Theorem 
6.4.3), which verifies the formula for f . 

A differentiated power series is a power series in its own right, and Theorem 
6.5.6 implies that, although the series may no longer converge at a particular 
endpoint, the radius of convergence does not change. By induction, then, power 
series are differentiable an infinite number of times. □ 


Exercises 


Exercise 6.5.1. Consider the function g defined by the power series 





(a) Is g defined on ( — 1,1)? Is it continuous on this set? Is g defined on 
(—1,1]? Is it continuous on this set? What happens on [—1,1]? Can 
the power series for g(x) possibly converge for any other points \x\ > 1? 
Explain. 

(b) For what values of x is g'(x ) defined? Find a formula for g' . 

Exercise 6.5.2. Find suitable coefficients (a n ) so that the resulting power series 
Y. a nX n has the given properties, or explain why such a request is impossible. 


(a) Converges for every value of x G R. 

(b) Diverges for every value of x G R. 

(c) Converges absolutely for all x G [—1, 1] and diverges off of this set. 
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(d) Converges conditionally at x = — 1 and converges absolutely at x = 1. 

(e) Converges conditionally at both x = — 1 and x = 1. 

Exercise 6.5.3. Use the Weierstrass M-Test to prove Theorem 6.5.2. 

Exercise 6.5.4 (Term-by-term Antidifferentiation). Assume f(x) = 
a nX n converges on (—R,R). 

(a) Show 


oo 

F(x) = 


n — 0 


a n x n+l 
n + 1 


is defined on (— R, R) and satisfies F'(x) = f(x). 


(b) Antiderivatives are not unique. If g is an arbitrary function satisfying 
g'(x) = f(x) on (— R, R), find a power series representation for g. 

Exercise 6.5.5. (a) If s satisfies 0 < s < 1, show ns n_1 is bounded for 

all n > 1. 


(b) Given an arbitrary x E (— R, R), pick t to satisfy 
start to construct a proof for Theorem 6.5.6. 


x 


< t < R. Use this 


Exercise 6.5.6. Previous work on geometric series (Example 2.7.5) justifies 
the formula 


1 


1 — x 


= 1 + X + x 2 + x 6 + X 4 + 


for all 


x 


< 1 


Use the results about power series proved in this section to find values for 
n /2 n and ^ 2 /2 n . The discussion in Section 6.1 may be helpful. 

Exercise 6.5.7. Let Y2 a n x n be a power series with a n ^ 0, and assume 


L = lim 

n— ^ oo 


^n+l 


a 


n 


exists. 

(a) Show that if L ^ 0, then the series converges for all x in (— 1/L, 1/L) 
(The advice in Exercise 2.7.9 may be helpful.) 

(b) Show that if L = 0, then the series converges for all x E R. 

(c) Show that (a) and (b) continue to hold if L is replaced by the limit 


L' = lim s n where s n = sup 


n— oo 


Qfc+1 

a/e 


k > n 


(General properties of the limit superior are discussed in Exercise 2.4.7.) 
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Exercise 6.5.8. (a) Show that power series representations are unique. If 

we have 

oo oo 

T a n x n = y b n x n 

n — 0 n — 0 

for all x in an interval (— R, R), prove that a n = b n for all n = 0 , 1 , 2 , ... . 

(b) Let f(x ) = a n xTl converge on (— R, R), and assume f'(x ) = f(x) 

for all x G (—R,R) and /( 0) = 1. Deduce the values of a n . 

Exercise 6.5.9. Review the definitions and results from Section 2.8 concerning 
products of series and Cauchy products in particular. At the end of Section 2.9, 
we mentioned the following result: If both ^2 a n and ^2 b n converge conditionally 
to A and B respectively, then it is possible for the Cauchy product, 

X; d n where d n = a 0 6 n + aib n -i H b a n b 0l 

to diverge. However, if ^2 d n does converge, then it must converge to AB. To 
prove this, set 

/(x) = X^ a n^ n , g(x) = X] b n x n , and h(x) = d n x n . 

Use Abel’s Theorem and the result in Exercise 2.8.7 to establish this result. 

Exercise 6.5.10. Let g(x) = b n x n converge on (— R, R), and assume 

(x n ) —> 0 with x n 7 ^ 0. If g(x n ) = 0 for all n G N, show that g(pc) must be 
identically zero on all of (— R, R). 

Exercise 6.5.11. A series a n sa ^ to be Abel-summable to L if the 

power series 

oo 

f(x) = y a n x n 

n = 0 

converges for all x G [0, 1) and L = lim,^!- f(x). 

(a) Show that any series that converges to a limit L is also Abel-summable 
to L. 

(b) Show that X!^Lo( — l) n i s Abel-summable and find the sum. 

6.6 Taylor Series 

Our study of power series has led to some enthusiastic conclusions about the 
nature of functions of the form 

f(x) = do + a\x + d 2 X 2 + CL 3 X 3 + a^x 4 + • • • . 

Despite their infinite character, power series can be manipulated more or less as 
though they are polynomials. On its interval of convergence, a power series is 
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continuous and infinitely differentiable, and successive derivatives or antideriva- 
tives can be computed by performing the desired operation on each individual 
term in the series — just as it is done for polynomials. 

In Section 6.1 we informally encountered the powerful idea that familiar func- 
tions such as arctan(x) and y/l + x can be represented as power series. This is a 
game changing revelation. If a function can be represented as a power series, and 
a power series can be treated like a polynomial, then vast new possibilities are 
suddenly available for the kinds of calculations that can be undertaken. Given 
this state of affairs, it is natural to wonder whether all of the well-behaved — 
i.e., infinitely differentiable — functions of calculus might have representations as 
power series. 

In the examples and exercises in this section, we will assume the familiar 
properties of the trigonometric, inverse trigonometric, exponential, and loga- 
rithmic functions. Rigorously defining these functions is an important exercise 
in analysis. In fact, one of the most common methods for providing proper def- 
initions is through power series, a point of view that is explored in Section 8.4. 
The point of this discussion, however, is to come at this question from the other 
direction. Assuming we are in possession of an infinitely differentiable function 
such as sin(x), can we find suitable coefficients a n so that 

sin(x) = ao + a\x + a^x 2 + a^x 3 + a^x 4 + • • • 
for at least some nonzero values of xl 

Manipulating Series 

In Section 6.1 we generated several new series representations starting from the 
formula 



1 


1 — x 


= 1 + x -f x 2 + x 6 + x 4 + 


for all 


x 


< 1 


proved in Example 2.7.5. At the time, we were not concerned with supply- 
ing rigorous proofs, but we have since done the bulk of the work necessary to 
confidently assert that the manipulations in Section 6.1 are perfectly valid. 


Example 6.6.1. Theorem 6.5.7 applied to equation (1) gives 


1 


{l-x)‘ 


= 1 + 2x + 3x 2 + 4x 3 + 5x 4 + • • • , for all \x\ < 1 


What about the series we generated for arctan(x)? The substitution of — x 2 for 
x in (1) doesn’t cause any problem: 


1 


/? 

= 1 — x A + x* — X + X 


1 + x d 


.8 


for all 


x 


< 1. 
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The content of Exercise 6.5.4 is that we can take the term-by-term antideriva- 
tive of this series and arrive at an antiderivative for 1/(1 + x 2 ). Noting that 
arctan(O) = 0, it follows that 



/x 1 o 1 . 1 7 

arctan(x) — x x H — x — —x + 

3 5 7 


for all x G (—1,1). In fact, this formula is also valid for x = ±1. (Exercise 6.6. E) 
Similar methods can be used to find series representations for functions such as 
log(l + x) and x/ (1 + x 2 ) 2 . 


Taylor’s Formula for the Coefficients 

Manipulating old series to produce new ones was a well-honed craft in the 
17th and 18th centuries, but there also emerged a formula for producing the 
coefficients from “scratch” — a recipe for generating a power series representation 
using only the function in question and its derivatives. The technique is named 
after the mathematician Brook Taylor (1685-1731) who published it in 1715, 
although it was certainly known previous to this date. 

Given an infinitely differentiable function / defined on some interval centered 
at zero, the idea is to assume that / has a power series expansion and deduce 
what the coefficients must be. 

Theorem 6.6.2 (Taylor’s Formula). Let 

(3) f(x) = do + a\x + a^x 2 + a^x 3 + a^x 4 + a$x b + • • • 

be defined on some nontrivial interval centered at zero. Then, 

/ (n) ( 0) 


Proof. Exercise 6.6.3 


□ 


Let’s use Taylor’s formula to produce the so-called Taylor series for sin(x). 
For the constant term we get a o = sin(0) = 0. Then, a\ = cos(0) = 1, = 

— sin(0)/2! = 0, and as = — cos(0)/3! = —1/3!. Continuing on, we are led to 
the series 


S ^ 7 

ry* ^ rp ^ rp 1 

«L T tAs 

X ~W + 5!~7! + 


So can we say that this series equals sin(x)? Well, we need to be very clear about 
what we have proved to this point. To derive Taylor’s formula, we assumed that 
f actually had a power series representation. The conclusion is that if / can be 
expressed in the form 

oo 

fix) = E anxU 

n — 0 


•> 
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then it must be that 

_ f (n) ( 0) 

n\ 

But what about the converse question? Assume / is infinitely differentiable 
in a neighborhood of zero. If we let 

_ / (n) ( 0) 
n\ 


does the resulting series 

oo 

^2 a n X n 

n = 0 


converge to f{pc) on some nontrivial set of points? Does it converge at all? If 
it does converge, we know that the limit function is an infinitely differentiable 
function whose derivatives at zero are exactly the same as the derivatives of /. 
Is it possible for this limit to be different from /? In other words, might the 
Taylor series of a function converge to the wrong thing? 

Let 

S TV (t) = &0 H~ CL\X T CL2X + ' ' ' + CL]\[X 


The polynomial Sn(x) is a partial sum of the Taylor series expansion for the 
function f(x). Thus, we are interested in whether or not 


lim Sn{%) = f(x) 

TV— ^oo 


for some values of x besides zero. 


Lagrange’s Remainder Theorem 

A powerful tool for analyzing this question was provided by Joseph Louis La- 
grange (1736-1813). The idea is to consider the difference 

E n (x) = f(x) - S N (x), 


which represents the error between / and the partial sum Sn- 

Theorem 6.6.3 (Lagrange’s Remainder Theorem). Let f be differentiable 
N + 1 times on (— i?, R), define a n = f^ n \0)/n\ for n = 0, 1, . . . , N, and let 

S N (x) = uq T cl\x H - a%x + • • • + a^x 


Given x 7^ 0 in (—R,R), there exists a point c satisfying \c 
error function En(x) = f(x) — Sn(x) satisfies 


< 


x 


where the 


(N + 1)! 


E n (x) 



6.6. Taylor Series 


201 


Before embarking on a proof, let’s examine the significance of this result. 
Proving Sn(x) — > f(x) is equivalent to showing En(x) 0. There are three 
components to the expression for En(x). In the denominator, we have (TV + 1)!, 
which helps to make En small as N tends to infinity. In the numerator, we 
have x 7V+1 , which potentially grows depending on the size of x. Thus, we should 
expect that a Taylor series is less likely to converge the farther x is chosen from 
the origin. Finally, we have /( iV + 1 )(c), which is a bit of a mystery. For functions 
with straightforward derivatives, this term can often be handled using a suitable 
upper bound. 


Example 6.6.4. Consider the Taylor series for sin(x) generated earlier. How 
well does 


ri / \ 1 3 1 5 

Ss{x) = X - —X + —X 

3! 5! 


approximate sin(x) on the interval [—2,2]? Lagrange’s Remainder Theorem 
asserts that the difference between these two functions is 


E 5 (x) = sin(x) - S 5 (x) = x 6 

o! 

for some c in the interval ( — |x|, \x\). Not knowing the value of c, we can still be 
quite certain that | sin (c)| < 1. Because x E [—2, 2], we have 


>6 


E 5 (x) < 


6 ! 


.089. 


To prove that Sn(x) converges uniformly to sin(x) on [—2,2], we observe 
that the / ( ^ iV+1 ^(c) term in the Lagrange formula will never exceed 1 in absolute 
value. Thus, 


E n (x ) 


/ (7V+1) W 

(TV + 1)! 


x N +1 


- 1 - o-ZV+l 

- (TV + 1)! 


for x E [—2, 2]. Because factorials grow significantly faster than exponentials, it 
follows that En{x) 0 uniformly on [—2,2]. 

Replacing the constant 2 with an arbitrary constant R has no effect on the 
validity of the argument, and so the Taylor series converges uniformly to sm(x) 
on every interval of the form [— R, R]. 


Proof of Lagrange 7 s Remainder Theorem: The Taylor coefficients are chosen 
so that the function / and the polynomial Sn have the same derivatives at 
zero, at least up through the TVth derivative, after which Sn becomes the zero 
function. In other words, f^ n \ 0) = S^\ 0) for all 0 < n < TV, which implies 
the error function En{x) = f(x) — Sn(x) satisfies 


E ^ (0) = 0 for all n = 0, 1, 2, ... , TV. 


The key ingredient in this argument is the Generalized Mean Value Theorem 
(Theorem 5.3.5) from Chapter 5. To simplify notation, let’s assume x > 0 and 
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apply the Generalized Mean Value Theorem to the functions E 'n(x) and x NJrl 
on the interval [0,x]. Thus, there exists a point x\ G (0,x) such that 


En(x) _ E’ n (x i) 
x N +! - (N + l)x? m 


Now apply the Generalized Mean Value Theorem to the functions E' N (x ) and 
( N + l)x N on the interval [0, aq] to get that there exists a point x 2 G (0,aq) 
where 


En(x) _ E' n (x 1 ) _ E'n{x 2 ) 

x N + 1 _ (N + l)x? ~ (N + 1 )Nx %~ 1 ' 

Continuing in this manner we find 

E n (x) = E^ + 1 \x n+ 1 ) 

^iv+i (JV + 1)! 

where xat+i € (0, xjv) C • • • C (0, x). Now set c = xjv+i- Because S^ +1 \x) = 
0, we have E^ +1 \x) = /(■ /v + 1 )( x ) and it follows that 


En{x) 


/ (7V+1) (c) 

(N + 1)! 




as desired. 


□ 


Taylor Series Centered at a / 0. 

Throughout this chapter we have focused our attention on series expansions 
centered at zero, but there is nothing special about zero other than notational 
simplicity. If / is defined in some neighborhood of a G R and infinitely differ- 
entiable at a, then the Taylor series expansion around a takes the form 

oo 

c n (x — a) n where c n = 




Setting En(x) = f(x) — Sn{x) as usual, Lagrange’s Remainder Theorem in this 
case says that there exists a value c between a and x where 


E n (x) 


EEM (x 

(N + iy. [ 


a) N+1 . 


In Exercise 6.6.9, we derive an alternate remainder formula due to Cauchy that 
requires these more general expansions for its derivation. 
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A Counterexample 


Lagrange’s Remainder Theorem is extremely useful for determining how well the 
partial sums of the Taylor series approximate the original function, but it leaves 
unresolved the central question of whether or not the Taylor series necessarily 
converges to the function that generated it. The appearance of / (iV+1) (c) in the 
error formula makes any general statement impossible. The Cauchy form of the 
remainder just mentioned provides another way to represent the error between 
the partial sum Sn{%) and the function /(#), and there are others still, but 
none lend themselves to a proof that Sjsr /• This is because no such proof 
exists! Let 



e X ! x for x ^ 0, 

0 for x = 0. 


Computing the Taylor coefficients for this function, it’s clear that &o = #(0) = 0. 
To compute cq we write 


cq = g'( 0) = lim 

x — ^0 


g(x) - g( o) 


X 


0 


= lim 

x — ^0 


-l/x‘ 


X 


1/x 
x^o gi /^ 2 


= lim 


where both numerator and denominator tend to oo as x approaches zero. App- 
lying the oo/oo version of L’Hospital’s Rule (Theorem 5.3.8) we see 


a i 


lim rllT 
X^o e 1 ^ (—2/x 3 ) 


lim 


x 


X^r 


o 2e 1 / 


ar 



This tells us that g is flat at the origin. In Exercise 6.6.6, we outline the rest of 
the proof showing that g^(0) = 0 for all n E N; in other words, g is extremely 
flat at the origin. 

The implications of this example are highly significant. The function g is 
infinitely differentiable, and every one of its Taylor coefficients is equal to zero. 
By default, then, its Taylor series converges uniformly on all of R to the zero 
function. But other than at x = 0, g(x) is never equal to zero. The Taylor series 
for g(x) converges, but it does not converge to g(x) except at the center point 
x = 0. The unmistakable conclusion is that not every infinitely differentiable 
function can be represented by its Taylor series. 


Exercises 

Exercise 6.6.1. The derivation in Example 6.6.1 shows the Taylor series for 
arctan(x) is valid for all x E ( — 1,1). Notice, however, that the series also 
converges when x = 1. Assuming that arctan(x) is continuous, explain why the 
value of the series at x = 1 must necessarily be arctan(l). What interesting 
identity do we get in this case? 

Exercise 6.6.2. Starting from one of the previously generated series in this 
section, use manipulations similar to those in Example 6.6.1 to find Taylor 
series representations for each of the following functions. For precisely what 
values of x is each series representation valid? 
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(a) xcos(x 2 ) 

(b) x/ (1 + 4x 2 ) 2 

(c) log(l + x 2 ) 

Exercise 6.6.3. Derive the formula for the Taylor coefficients given in 
Theorem 6.6.2. 


Exercise 6.6.4. Using only Lagrange’s Remainder Theorem (and no references 
to Abel’s Theorem) prove 


1111 

2 + 3 ~ 4 + 5 



= log(2). 


Exercise 6.6.5. (a) Generate the Taylor coefficients for the exponential func- 

tion f(x) = e x , and then prove that the corresponding Taylor series con- 
verges uniformly to e x on any interval of the form [-R, R\. 

(b) Verify the formula f'(x) = e x . 

(c) Use a substitution to generate the series for e -a y and then informally 
calculate e x • e~ x by multiplying together the two series and collecting 
common powers of x. 

Exercise 6.6.6. Review the proof that g'( 0) = 0 for the function 



e X ! x for x ^ 0, 

0 for x = 0. 


introduced at the end of this section. 


(a) Compute g'(x) for x^O. Then use the definition of the derivative to find 

g"{ 0 )- 


(b) Compute g"(x) and g"'(x) for x ^ 0. Use these observations and in- 
vent whatever notation is needed to give a general description for the nth 
derivative g^ n \x) at points different from zero. 

(c) Construct a general argument for why g^(0) = 0 for all n E N. 


Exercise 6.6.7. Find an example of each of the following or explain why no 
such function exists. 


(a) An infinitely differentiable function g{x) on all of R with a Taylor series 
that converges to g{x) only for x E (—1,1). 

(b) An infinitely differentiable function h(x) with the same Taylor series as 
sin(x) but such that h(x) ^ sin(x) for all x^0. 

(c) An infinitely differentiable function f(x) on all of R with a Taylor series 
that converges to f{x) if and only if x < 0. 
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Exercise 6.6.8. Here is a weaker form of Lagrange’s Remainder Theorem whose 
proof is arguably more illuminating than the one for the stronger result. 


(a) First establish a lemma: If g and h are differentiable on [0,x] with g( 0) 
h(0) and g'(t) < h'(t ) for all t E [0,x], then g(t) < h(t ) for all t E [0,x]. 


(b) Let /, Sn, and En be as Theorem 6.6.3, and take 0 < x 
f( N + i)(t)| < M for all t E [0,x], show 


< R. 


En(oc) | < 


Mx n+1 

(N + 1)! * 


If 


Exercise 6.6.9 (Cauchy’s Remainder Theorem). Let / be differentiable 
N + 1 times on (— R, R). For each a E (— R, R), let Sn(x, a) be the partial sum 
of the Taylor series for / centered at a; in other words, define 


N 

a) = c n (x — a) n where c n 

n=0 



Let Ejy(x, a) = f(x) — Sjsr(x , a). Now fixx^O in (— R, R) and consider Ejy(x, a) 
as a function of a. 


(a) Find En( 


(b) Explain why Ejy(x,a) is differentiable with respect to a, and show 


E' n (x, a) 


_/(Af +1 )( a ) 

iV! 




(c) Show 


E n (x) = E n (x, 0) 


/ (JV+1) (c) 

TV! 




TV 




for some c between 0 and x. This is Cauchy’s form of the remainder for 
Taylor series centered at the origin. 


Exercise 6.6.10. Consider f(x) = l/\/l — x. 


(a) Generate the Taylor series for / centered at zero, and use Lagrange’s 
Remainder Theorem to show the series converges to / on [0,1/2]. (The 
case x < 1/2 is more straightforward while x = 1/2 requires some extra 
care.) What happens when we attempt this with x > 1/2? 


(b) Use Cauchy’s Remainder Theorem proved in Exercise 6.6.9 to show the 
series representation for / holds on [0, 1). 


6.7 The Weierstrass Approximation Theorem 

Karl Weierstrass’s name is attached to a number of significant results discussed 
already. The Bolzano- Weierstrass Theorem was fundamental to understanding 
the relationship between convergence, completeness, and compactness worked 
out in the early chapters. In this chapter, the Weierstrass M-Test emerged 
as the primary tool for demonstrating uniform convergence of infinite series. 
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As discussed in Section 5.4, Weierstrass was also responsible for one of the 
earliest examples of a continuous, nowhere differentiable function, making this 
discovery in 1872. 

In 1885, Weierstrass proved a result that served as an interesting counter- 
point to his nowhere differentiable function. This theorem, which also bears his 
name, would become the catalyst for a new branch of analysis called approxi- 
mation theory. 

Theorem 6.7.1 (Weierstrass Approximation Theorem). Let f : [a, b\ -V 

R be continuous. Given e > 0, there exists a polynomial p(x) satisfying 


f(x) -p{x) 


< e 


for all x G [a, b] 


A restatement of the Weierstrass Approximation Theorem (WAT) without 
all the symbols is that every continuous function on a closed interval can be 
uniformly approximated by a polynomial. 


Exercise 6.7.1. Assuming WAT, show that if / is continuous on [a, 6], then 
there exists a sequence (p n ) of polynomials such that p n f uniformly on [a, b\. 

Our work in the previous section provides a nice starting point for under- 
standing what WAT is saying. Given a function such as sin(x), we saw in 
Example 6.6.4 that the resulting Taylor series converges uniformly on compact 
sets back to sin(x). Because the partial sums of a Taylor series are polynomials, 
this example constitutes a proof of WAT in the very special case of f(pc) = sin(x). 
It should be clear, however, that Taylor series won’t work in general. To con- 
struct a Taylor series, we need / to be an infinitely differentiable function (and 
even then the Taylor series might fail to approximate /), while WAT requires 
only that / be continuous. 

So should we be surprised that such a theorem is true? This is hard to say. 
On a purely intuitive level, if we consider a smooth curve like f(x) = y/1 — x on 
[—1,1], then it doesn’t take too much imagination to believe that a polynomial 
might exist that tracks closely with \fl — x as x moves over the domain. But 
one of the lessons of Section 5.4 is that a continuous function does not have to 
be smooth. Although it is not Weierstrass’s original example, a careful look at 
the nowhere differentiable function shown in Figure 5.7 makes the point just as 
well. Despite the unimaginably jagged nature of the graph, according to WAT, 
it is still possible to find a polynomial that uniformly approximates this unruly 
function to any prescribed degree of accuracy. 


Interpolation 

Weierstrass’s theorem deals with approximating polynomials, but a good way to 
get a feel for the content of this result is to temporarily replace the polynomials 
in WAT with the collection of all continuous, piecewise-linear functions. 
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Figure 6.6: Polygonal approximation of f(x) = y /1 - x. 


Definition 6.7.2. A continuous function 
a partition 

a = xq < x\ < X 2 < 


a, b\ -A R is polygonal if there is 
<x n = b 


of [a, b] such that <p is linear on each subinterval [oa_i, a^], where i = 1, . . . n. 

The term “interpolation” refers to the process of finding a function whose 
graph passes through a given set of points. If, for example, we take the points 

<°' 1) ' i Iff (iA (1 ' 0) 


then there is an obvious polygonal function that interpolates these points: it is 
just the function we get by connecting the points with line segments. Now these 
four points all he on the graph of / = y/l — x, and notice that the resulting 
polygonal interpolation does a reasonable job of imitating the graph of /. (See 
Figure 6.6.) This is not an accident. 

Theorem 6.7.3. Let f : [a, b\ -a R be continuous. Given e > 0, there exists a 
polygonal function satisfying 


f(x) - 4>{x) 


< e 


for all x G [a, b] 


Exercise 6.7.2. Prove Theorem 6.7.3. 


Notice how similar Theorem 6.7.3 is to WAT, the only difference being that 
we have substituted a polygonal function in place of the polynomial. 

The strategy for the proof of Theorem 6.7.3 is to first choose an appropriate 
numbers of points on the graph of /, and then show that the resulting polygonal 
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interpolation of these points does the trick. It’s not unreasonable to suspect 
that a similar strategy might lead to a proof of the Weierstrass Approximation 
Theorem. Can we prove WAT by constructing a polynomial interpolation of 
points on the graph of /? Well, no as it turns out, but this is not so easy to see. 

Exercise 6.7.3. (a) Find the second degree polynomial p(x) = qo+qiX+q 2 X 2 

that interpolates the three points (—1, 1), (0, 0), and (1, 1) on the graph of 
g{pc) = \x\. Sketch g{pc) and p(x) over [—1,1] on the same set of axes. 

(b) Find the fourth degree polynomial that interpolates g{pc) = \x\ at the 
points x = —1, —1/2, 0, 1/2, and 1. Add a sketch of this polynomial to 
the graph from (a). 

The previous exercise may still give the impression that a polynomial inter- 
polation approach is going to lead to a proof of WAT, but that isn’t the case. 
Continuing on with larger and larger numbers of equally spaced points yields 
high degree polynomials that oscillate very rapidly and actually do a poor job of 
approximating g between the interpolating points. In fact, it turns out that the 
resulting sequence of polynomials only converges to g(x) when x = — 1,0, or 1. 

Approximating the Absolute Value Function 

Having reached a temporary dead end, we need to back up a bit and take a 
different turn. Let’s return to Theorem 6.7.3 which asserts that every continuous 
function can be uniformly approximated by a polygonal function. This should 
feel like a promising first step toward a proof of WAT and indeed it is. If we can 
find a way to approximate an arbitrary polygonal function with polynomials, 
then a triangle inequality argument would finish the proof. 

Before we get too excited about this line of attack, keep in mind that the 
absolute value function from Exercise 6.7.3 is an example of a polygonal function 
and we are currently unsure how to produce polynomials to approximate it. 
What has changed, however, is our motivation for doing so. A moment’s thought 
reveals that handling the absolute value function might be the key to solving 
the whole problem. Why is this? Every polygonal function is made up of 
line segments that meet at corners. If we can find polynomials that uniformly 
approximate g(x) = \x\ with its right angled corner at the origin, then with a 
little cleverness we ought to be able to handle more general polygonal functions 
and prove WAT using Theorem 6.7.3. 


Cauchy’s Remainder Formula for Taylor Series 


One elegant way to show g(x) = \x\ is the uniform limit of polynomials is via 
Taylor series, which is a bit surprising given that \x\ is not differentiable. The 
trick, as we will see, is to start by computing the Taylor series for the infinitely 
differentiable function \f\ — x. 
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Exercise 6.7.4. Show that f(x) = a/1 — x has Taylor series coefficients a n 
where = 1 and 


a 


n 


— 1 • 3 • 5 • • • (2n — 3) 

2 • 4 • 6 • • • 2n 


for n > 1. 


Our goal is to show 



oo 

\/I — x = a n x n 

n=0 


for all x G 


— 1,1] by showing that the error function 


N 

En(x) = y/l — x — a n x n 

n = 0 


tends to 0 as N oo. To this point, Lagrange’s Remainder Theorem has been 
the featured tool for jobs like this, but it comes up short in this case. To see 
exactly why, fix x G (0,1]. Then Theorem 6.6.3 asserts that there exists a 
c G (0,x) (dependent on N) such that 


E n (x) 


/ (jV+1) (c) 
(N + iy. 

i 

(N + 1)! 
-1-3-5 


x 


N + 1 


-1 • 3 • 5- -• (21V - 1) 
2 iV+1 (l - c) N+1 / 2 

• ( 2 N - 1 ) 


2 • 4 • 6 • • • (2N + 2) y \l-c 


x N+1 

N+ 1/2 

r l/2 


The problem is that x/(l — c ) is largest when c = x, and (x/ (1 — x))^ 1 / 2 
goes exponentially to infinity when x is bigger than 1/2. This doesn’t mean 
our Taylor series is only valid on [0, 1/2]; it just means we are using the wrong 
remainder formula. 


Exercise 6.7.5. (a) Follow the advice in Exercise 6.6.9 to prove the Cauchy 

form of the remainder: 

E N ( X ) = hT7M {x _ c) N x 

for some c between 0 and x. 


(b) Use this result to prove equation (1) is valid for all x G ( — 1,1). 

Although Cauchy’s Remainder Theorem doesn’t tell us so, equation (1) is 
also valid at x = ±1. 
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Exercise 6.7.6. (a) Let 

_ 1 • 3 • 5 • • • (2n — 1) 

Cn “ 2 • 4 • 6 • • • 2n 

for n > 1. Show c n < V2 l +1 ■ 

(b) Use (a) to show that a n converges (absolutely, in fact) where a n is 

the sequence of Taylor coefficients generated in Exercise 6.7.4. 

(c) Carefully explain how this verifies that equation (1) holds for all x G 

[-M]. 

Recall that our goal is to find polynomials that uniformly approximate the 
absolute value function on an interval containing the non- differentiable point at 
the origin. Our Taylor series for \Jl — x provides a clever shortcut for handling 
this task. 

Exercise 6.7.7. (a) Use the fact that \a\ = Vo? to prove that, given e > 0, 

there exists a polynomial q{x) satisfying 



q{x) | < e 


for all x G [—1, 1] 


(b) Generalize this conclusion to an arbitrary interval [a, b\ 


Proving WAT 


Earlier we suggested that proving WAT for the special case of the absolute value 
function was the key to the whole proof. Now it is time to fill in the details. 

Exercise 6.7.8. (a) Fix a G [—1,1] and sketch 


1 

h a (x) = ~(\x — a\ + (x — a)) 


over 
x G 


[-1,1]. 

— 1, a\. 


Note that h a is polygonal and satisfies h a (x) = 0 for all 


(b) Explain why we know h a (x) can be uniformly approximated with a poly- 
nomial on [—1,1]. 

(c) Let be a polygonal function that is linear on each subinterval of the 
partition 

— 1 = CLq < CL i < 02 <C • • • <C U n 1 . 

Show there exist constants 6 0 ? • • • ? b n - i so that 


<t){x) = (j){- 1) + b 0 h ao (x) + b\h ai {x) H 1- b n -ih a „_ 1 (x) 


for all x G [—1,1]. 
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(d) Complete the proof of WAT for the interval [—1,1], and then generalize 
to an arbitrary interval [a, b\. 

Exercise 6.7.9. (a) Find a counterexample which shows that WAT is not 

true if we replace the closed interval [a, b } with the open interval (a, b). 

(b) What happens if we replace [a, b] with the closed set [a, oo). Does the 
theorem still hold? 

Exercise 6.7.10. Is there a countable subset of polynomials C with the prop- 
erty that every continuous function on [a, b] can be uniformly approximated by 
polynomials from C? 

Exercise 6.7.11. Assume that / has a continuous derivative on [a, b\. Show 
that there exists a polynomial p(pc) such that 


f(x) — p(x) | < e and | f'(x) — p f (x)\ < e 


for all x G [a, b\. 


6.8 Epilogue 


The argument sketched out here for the Weierstrass Approximation Theorem 
is due to Henri Lebesque, who published his proof in 1898. Its greatest virtue 
is its relative simplicity. Starting from a single special case — the absolute value 
function — we managed to bootstrap our way up to an arbitrary continuous 
function. A downside of this approach is that by the time we reach the case of 
a general continuous function, there is no practical way to explicitly write down 
a formula for the polynomial that approximates it. 

There are a number of other proofs for WAT that don’t have this drawback. 
A particularly popular one was provided by Sergei Bernstein. Bernstein employs 
a family of polynomials — now called Bernstein polynomials — that have become 
important in their own right. Weierstrass’s original approach was also quite 
elegant. His proof has much in common with the proof of Fejer’s Theorem in 
Section 8.5 on Fourier series. Not coincidentally, it is possible to derive yet 
another proof of WAT as a corollary to Fejer’s Theorem. (See Exercise 8.5.11.) 

The Weierstrass Approximation Theorem is set on a closed interval [a, b\. 
Exercise 6.7.9 is included to emphasize the importance of the closed and bounded 
nature of the domain, but it should not be too surprising that the theorem will 
remain true if we replace [a, b] with an arbitrary compact set. What about 
replacing the set of polynomials? Are there other collections of relatively simple 
continuous functions that can be used to approximate an arbitrary continuous 
function? Sure there are. In Theorem 6.7.3 we saw that polygonal functions have 
this property, and there are other examples as well. In the late 1930s, Marshall 
Stone proved a far-reaching generalization of the Weierstrass Approximation 
Theorem. Stone’s version of WAT starts with an arbitrary compact set K and 
a collection C of continuous functions on K with the following three properties: 
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(i) the constant function k(x) = 1 is in C, 

(ii) if p, q G C and cGR then p + g,pg, and cp are all in C, 

(iii) if x 7^ y in K, then there exists p E C with p(x) ^ p(p). 


Under these conditions, Stone showed that any continuous function on K could 
be uniformly approximated by functions in C. This result, referred to as the 
Stone- Weierstrass Theorem, has a slightly more involved proof that tracks very 
closely with Lebesgue’s proof of WAT outlined in the previous section. In par- 
ticular, both arguments depend fundamentally on being able to approximate 
the absolute value function with polynomials. 

A collection of functions that possesses property (ii) of the Stone- Weierstrass 
Theorem is called an algebra. An algebra that possesses property (iii) is said to 
separate points. Having the constant function k(x) = 1 in the algebra ensures 
we don’t have some Xq E K where p(pc o) = 0 for all functions in our algebra. 
(Why would this be problematic?) It is straightforward to check that the set of 
polynomials as well as the set of polygonal functions form algebras that separate 
points, and so both WAT and Theorem 6 . 7.3 become special cases of Stone’s 
general result. For a new example, consider the collection of polynomials with 
only even powers on the interval [ 0 , 1 ]. The Stone- Weierstrass Theorem tells 
us that this subset of polynomials can still uniformly approximate an arbitrary 
continuous function, although if we were to switch our domain to [—1, 1] then 
this algebra would no longer separate points. As a final example, consider the 
set 

C = {ao + a\ cos(x) + • • • + a n cos (nx) : ao, ai, . . . , a n E R}. 


In Section 8.5 we take up the theory of Fourier series which explores when a 
function has a representation as an infinite series of trigonometric functions. As 
a precursor to that conversation, notice that the Stone- Weierstrass Theorem 


tells us at the outset that at least every continuous function on [0, n 
uniform limit of functions from C. 


is the 


The story from Section 6.6 surrounding Taylor series expansions also deserves 
a final word. The ingenuity with which Euler and others found and exploited 
power series representations for the cast of familiar functions from calculus und- 
erstandably led to speculation that every function could be represented in such 
a fashion. (The term “function” at this time implicitly referred to functions that 
were infinitely differentiable.) This point of view effectively ended with Cauchy’s 
discovery in 1821 of the counterexample presented at the end of Section 6.6. 
So under what conditions does the Taylor series necessarily converge to the 
generating function? Lagrange’s Remainder Theorem states that the difference 
between the Taylor polynomial Sn(x) and the function f(x) is given by 


= ) M 
(W + l)! 


En(x ) 
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The (TV + 1)! term in the denominator grows more rapidly than the x NJrl term 
in the numerator. Thus, if we knew for instance that 

/ (JV+1) (c)| < M 

for all c G (—R,R) and TV G N, we could be sure that En(x) 0 and hence 
that Sn(x) f(x). This is the case for sin(x), cos(x), and e x , whose derivatives 
do not grow at all as TV — > oo. It is also possible to formulate weaker conditions 
on the rate of growth of that guarantee convergence. 

It is not altogether clear whether Cauchy’s counterexample should come as 
a surprise. The fact that every previous search for a Taylor series ended in 
success certainly gives the impression that a power series representation is an 
intrinsic property of infinitely differentiable functions. But notice what we are 
saying here. A Taylor series for a function / is constructed from the values 
of / and its derivatives at the origin. If the Taylor series converges to / on 
some interval (— R, R), then the behavior of / near zero completely determines 
its behavior at every point in (—R,R). One implication of this would be that 
if two functions with Taylor series agree on some small neighborhood (— e,e), 
then these two functions would have to be the same everywhere. When it is 
put this way, we probably should not expect a Taylor series to always converge 
back to the function from which it was derived. As we have seen, this is not 
the case for real-valued functions. What is fascinating, however, is that results 
of this nature do hold for functions of a complex variable. The definition of the 
derivative looks symbolically the same when the real numbers are replaced by 
complex numbers, but the implications are profoundly different. In this setting, 
a function that is differentiable at every point in some open disc must necessarily 
be infinitely differentiable on this set. This supplies the ingredients to construct 
the Taylor series that in every instance converges uniformly on compact sets to 
the function that generated it. 



Chapter 7 


The Riemann Integral 

7.1 Discussion: How Should Integration 
be Defined? 

The Fundamental Theorem of Calculus is a statement about the inverse relation- 
ship between differentiation and integration. It comes in two parts, depending 
on whether we are differentiating an integral or integrating a derivative. Under 
suitable hypotheses on the functions / and F, the Fundamental Theorem of 
Calculus states that 

(i) f F\x) dx = F(b) — F(a ) and 

J a 

nX 

(ii) if G(x) = I f(t)dt, then G'(x) = f(x). 

J a 

Before we can undertake any type of rigorous investigation of these statements, 
we need to settle on a definition for J ^ /. Historically, the concept of integration 
was defined as the inverse process of differentiation. In other words, the integral 
of a function / was understood to be a function F that satisfied F' = /. Newton, 
Leibniz, Fermat, and the other founders of calculus then went on to explore the 
relationship between antiderivatives and the problem of computing areas. This 
approach is ultimately unsatisfying from the point of view of analysis because it 
results in a very limited number of functions that can be integrated. Recall that 
every derivative satisfies the intermediate value property (Darboux’s Theorem, 
Theorem 5.2.7). This means that any function with a jump discontinuity cannot 
be a derivative. If we want to define integration via antidifferentiation, then we 
must accept the consequence that a function as simple as 
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Figure 7.1: A Riemann Sum. 


| 1 for 0 < x < 1 
y 2 for 1 < x < 2 


is not integrable on the interval [0,2]. 

A very interesting shift in emphasis occurred around 1850 in the work of 
Cauchy, and soon after in the work of Bernhard Riemann. The idea was to 
completely divorce integration from the derivative and instead use the notion 
of “area under the curve” as a starting point for building a rigorous definition 
of the integral. The reasons for this were complicated. As we have mentioned 
earlier (Section 1.2), the concept of function was undergoing a transformation. 
The traditional understanding of a function as a holistic formula such as f(x) = 
x 2 was being replaced with a more liberal interpretation, which included such 
bizarre constructions as Dirichlet’s function discussed in Section 4.1. Serving as 
a catalyst to this evolution was the budding theory of Fourier series (discussed 
in Section 8.5), which required, among other things, the need to be able to 
integrate these more unruly objects. 

The Riemann integral, as it is called today, is the one usually discussed in 
introductory calculus. Starting with a function / on [a, 6], we partition the 
domain into small subintervals. On each subinterval [xk-*i,Xk\, we pick some 
point Ck G [xk-i,Xk\ and use the ?/-value f(ck) as an approximation for / on 
[xk-i,Xk\- Graphically speaking, the result is a row of thin rectangles con- 
structed to approximate the area between / and the x-axis. The area of each 
rectangle is f(ck)(xk — Xk-i), and so the total area of all of the rectangles is 
given by the Riemann sum (Fig. 7.1) 


n 

£/(<*)(** _ x k -i). 
k = 1 

Note that “area” here comes with the understanding that areas below the x-axis 
are assigned a negative value. 
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What should be evident from the graph is that the accuracy of the Riemann- 
sum approximation seems to improve as the rectangles get thinner. In some 
sense, we take the limit of these approximating Riemann sums as the width of 
the individual subintervals of the partitions tends to zero. This limit, if it exists, 
is Riemann’ s definition of f . 

This brings us to a handful of questions. Creating a rigorous meaning for 
the limit just referred to is not too difficult. What will be of most interest 
to us — and was also to Riemann — is deciding what types of functions can be 
integrated using this procedure. Specifically, what conditions on / guarantee 
that this limit exists? 


The theory of the Riemann integral turns on the observation that smaller 
subintervals produce better approximations to the function /. On each subin- 
terval [£fc_i,£fc], the function / is approximated by its value at some point 
Ck G [xk-i,Xk\- The quality of the approximation is directly related to the 
difference 


f{x) - f(Ck) 


as x ranges over the subinterval. Because the subintervals can be chosen to 
have arbitrarily small width, this means that we want f(x) to be close to f(ck) 
whenever x is close to c^. But this sounds like a discussion of continuity! We 
will soon see that the continuity of / is intimately related to the existence of 
the Riemann integral J ^ f. 

Is continuity sufficient to prove that the Riemann sums converge to a well- 
defined limit? Is it necessary, or can the Riemann integral handle a discontin- 
uous function such as h(x) mentioned earlier? Relying on the intuitive notion 

of area, it would seem that f h = 3, but does the Riemann integral reach this 
conclusion? If so, how discontinuous can a function be before it fails to be inte- 
grate? Can the Riemann integral make sense out of something as pathological 
as Dirichlet’s function on the interval [0, 1]? 

A function such as 


n(r ) = I X 2 sin (f) for x ^ 0 
9[ 1 \ 0 for x = 0 

raises another interesting question. Here is an example of a differentiable func- 
tion, studied in Section 5.1, where the derivative g'(x) is not continuous. As we 
explore the class of integrate functions, some attempt must be made to reunite 
the integral with the derivative. Having defined integration independently of 
differentiation, we would like to come back and investigate the conditions under 
which equations (i) and (ii) from the Fundamental Theorem of Calculus stated 
earlier hold. If we are making a wish list for the types of functions that we 
want to be integrable, then in light of equation (i) it seems desirable to expect 
this set to at least contain the set of derivatives. The fact that derivatives are 
not always continuous is further motivation not to content ourselves with an 
integral that cannot handle some discontinuities. 
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7.2 The Definition of the Riemann Integral 


Although it has the benefit of some polish due to Darboux, the development 
of the integral presented in this chapter is closely related to the procedure just 
discussed. In place of Riemann sums, we will construct upper sums and lower 
sums (Fig. 7.2), and in place of a limit we will use a supremum and an infimum. 

Throughout this section, it is assumed that we are working with a bounded 
function / on a closed interval [a, 6], meaning that there exists an M > 0 such 
that \f(x)\ < M for all x E [a, b\. 


Partitions, Upper Sums, and Lower Sums 


Definition 7.2.1. A partition P of [a, b\ is a finite set of points from [a, b] that 
includes both a and b. The notational convention is to always list the points of 
a partition P = {xq, aq, £ 2 , • • • , x n } in increasing order; thus, 


a = Xq < x\ < X 2 < • • • < x n = b. 


For each subinterval [xk-i^xP of P, let 


m k = inf {/(a;) : x G [x k _i,x k ]} and M k = sup{f(x) : x € [x k -i, x k ]} 


The lower sum of / with respect to P is given by 


n 


L(f, P)=^2 m ki x k - x k-l) 


k = 1 


Likewise, we define the upper sum of / with respect to P by 


n 


U{f,P) = Y J M k {xk-Xk-i)- 


k = 1 


For a particular partition P, it is clear that U (/, P) > L(/, P). The fact that this 
same inequality holds if the upper and lower sums are computed with respect 
to different partitions is the content of the next two lemmas. 

Definition 7.2.2. A partition Q is a refinement of a partition P if Q contains 
all of the points of P; that is, if P C Q. 

Lemma 7.2.3. If P C Q, then L(f,P) < L(f,Q), and U(f,P) > U(f,Q). 

Proof. Consider what happens when we refine P by adding a single point z to 
some subinterval [xk-i,Xk\ of P. 
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Figure 7.2: Upper and Lower Sums. 



Focusing on the lower sum for a moment, we have 


where 


m k (x k -x k - 1 ) = 

< 


m k (x k - z) + m k (z - x k -i) 
m' k {x k - z) + m k (z - x k -i), 


m' k = inf {f(x) : x € [z, x k ]} and mj! = inf {f(x) : x € [x k -\, z]} 


are each necessarily as large or larger than m k- 

By induction, we have L(/, P ) < L(f , Q), and an analogous argument holds 
for the upper sums. □ 


Lemma 7.2.4. If Pi and P 2 are any two partitions of [a, b\, then L(/, Pi) < 

U(f,P 2 ). 


Proof. Let Q = P\ U P 2 be the so-called common refinement of Pi and P 2 . 
Because Pi C Q and P 2 C Q, it follows that 


L(f,Pi) < L(f,Q) < U(f,Q) < U(f,P 2 ). 


□ 
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Integrability 

Intuitively, it helps to visualize a particular upper sum as an overestimate for the 
value of the integral and a lower sum as an underestimate. As the partitions get 
more refined, the upper sums get potentially smaller while the lower sums get 
potentially larger. A function is integrable if the upper and lower sums “meet” 
at some common value in the middle. 

Rather than taking a limit of these sums, we will instead make use of the 
Axiom of Completeness and consider the infimum of the upper sums and the 
supremum of the lower sums. 


Definition 7.2.5. Let V be the collection of all possible partitions of the 
interval [a, b\. The upper integral of / is defined to be 


U{f) = inf {[/(/, P) : P £P}. 


In a similar way, define the lower integral of / by 


P(/)=sup{P(/,P):PeP}. 


The following fact is not surprising. 


Lemma 7.2.6. For any bounded function f on [a, b\, it is always the case that 
U(f)>L(f). 


Proof. Exercise 7.2.1. 


□ 


Definition 7.2.7 (Riemann Integrability). A bounded function / defined 
on the interval [a, b] is Riemann-integrable if U(f) = L(f). In this case, we 

define f^f or f(x) dx to be this common value; namely, 

f f = U(f)=L(f). 

J a 

The modifier “Riemann” in front of “integrable” accurately suggests that 
there are other ways to define the integral. In fact, our work in this chapter will 
expose the need for a different approach, one of which is discussed in Section 8.1. 
In this chapter, the Riemann integral is the only method under consideration, 
so it will usually be convenient to drop the modifier “Riemann” and simply refer 
to a function as being “integrable.” 


Criteria for Integrability 


To summarize the situation thus far, it is always the case for a bounded function 
/on [a, b] that 


sup{L(/, P) : P G V} = L(f) < U(f) = inf {£/(/, P) : P € V}. 

The function / is integrable if the inequality is an equality. The major thrust 
of our investigation of the integral is to describe, as best we can, the class 


7.2. The Definition of the Riemann Integral 


221 


of integrable functions. The preceding inequality reveals that integrability is 
really equivalent to the existence of partitions whose upper and lower sums are 
arbitrarily close together. 


Theorem 7.2.8 (Integrability Criterion). A bounded function f is inte- 
grable on [a, b] if and only if for every e > 0, there exists a partition P e of [a, b] 
such that 

U(f , P e ) - L(f , Pe) < e. 


Proof. Let e > 0. If such a partition P e exists, then 


U(f) - L(f) < U(f, Pe) - L(f, Pe) < e. 

Because e is arbitrary, it must be that U(f) = L(/), so / is integrable. (To be 
absolutely precise here, we could throw in a reference to Theorem 1.2.6.) 

The proof of the converse statement is a familiar triangle inequality argument 
with parentheses in place of absolute value bars because, in each case, we know 
which quantity is larger. Because U(f) is the greatest lower bound of the upper 
sums, we know that, given some e > o, there must exist a partition Pi such that 

U(f,P 1 )<U(f)+ e ~. 

Likewise, there exists a partition P 2 satisfying 

L{f , P 2 ) > L{f) - f 

Now, let P e = Pi UP 2 be the common refinement. Keeping in mind that the 
integrability of / means U (/) = L(/), we can write 

U(f,Pe)~L(f,Pe) < 

< 


U(f,P 1 )-L(f,P 2 ) 


U(f) + 


L(f) 


e 

2 


e e 

2 + 2 


e. 


□ 


In the discussion at the beginning of this chapter, it became clear that inte- 
grability is closely tied to the concept of continuity. To make this observation 
more precise, let P = {xq, aq, x 2 , . . . , x n } be an arbitrary partition of [a, 6], and 
define Axk = Xk — Xk-i- Then, 


n 


U(f , P) - L(f, P) = ~ m k ) Ax k , 

k=l 

where Mk and mk are the supremum and infimum of the function on the interval 
Xk-i, Xk \ , respectively. Our ability to control the size of U (/, P)— L(/, P) hinges 
on the differences Mk — mk-, which we can interpret as the variation in the range 
of the function over the interval [xk-i,Xk\- Restricting the variation of / over 
arbitrarily small intervals in [a, b] is precisely what it means to say that / is 
uniformly continuous on this set. 
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Theorem 7.2.9. If f is continuous on [a,b\, then it is integrable 


Proof. Because / is continuous on a compact set, it must be bounded. It is also 
uniformly continuous for the same reason. This means that, given e > 0, there 
exists a 5 > 0 so that \x — y\ < S guarantees 


I f(x) - f(y) | < 


b — a 


Now, let P be a partition of [a, b] where Axk = Xk — %k-i is less than S for 
every subinterval of P. 



% k k — 1 ^ ^ 


Given a particular subinterval [xk-i,Xk\ of P, we know from the Extreme 
Value Theorem (Theorem 4.4.2) that the supremum = f(zk) for some Zk E 
Xk-i,Xk\- In addition, the inflmum is attained at some point also in the 
interval [xk-i,Xk\- But this means \zk — Vk \ < so 


M k -m k = f{z k ) - f(y k ) < 


a 


F inally, 


n 


U(f, P) - L(f , P) = V(M fc - m k ) Ax k < V Ax k = e, 

z — ' b — a z — ' 

k = 1 


k = 1 


n 


and / is integrable by the criterion given in Theorem 7.2.8. 


□ 


Exercises 

Exercise 7.2.1. Let / be a bounded function on [a, 6], and let P be an arbitrary 
partition of [a, b\. First, explain why U(f) > L(f , P). Now, prove Lemma 7.2.6. 


Exercise 7.2.2. Consider f(x) = 1/x over the interval [1,4]. Let P be the 
partition consisting of the points {1,3/2, 2, 4}. 
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(a) Compute L(f,P), U(f,P), and U(f,P) - L(f,P). 

(b) What happens to the value of [/(/, P) — L(/, P) when we add the point 3 
to the partition? 

(c) Find a partition P' of [1,4] for which [/(/, P') — L(/, P 7 ) < 2/5. 

Exercise 7.2.3 (Sequential Criterion for Integrability). (a) Prove that 
a bounded function / is integrable on [a, b] if and only if there exists a 
sequence of partitions (P n )^ =1 satisfying 

hm [P(/,P n )-P(/,P n )] = 0, 


n— >■ ex) 


and in this case £ f = lim„^oo U(f, P n ) = lim n— ^OO 

(b) For each n, let P n be the partition of [0, 1] into n equal subintervals. Find 
formulas for U (/, P n ) and L(/, P n ) if f(x) = x. The formula 1 + 2 + 3 + 
• • • + n = n(n + l)/2 will be useful. 

(c) Use the sequential criterion for integrability from (a) to show directly that 
f(x) = x is integrable on [0, 1] and compute f Q f. 

Exercise 7.2.4. Let g be bounded on [a, b] and assume there exists a partition 
P with L(g, P) = U(g,P). Describe g. Is g necessarily continuous? Is it 

integrable? If so, what is the value of j ^ gl 

Exercise 7.2.5. Assume that, for each n, f n is an integrable function on [a, b\. 
If (/ n ) f uniformly on [a, 6], prove that / is also integrable on this set. (We 
will see that this conclusion does not necessarily follow if the convergence is 
pointwise.) 

Exercise 7.2.6. A tagged partition (P, {c^}) is one where in addition to a 
partition P we choose a sampling point c& in each of the subintervals [xk-i,Xk 
The corresponding Riemann sum , 


n 


R(f,P) = Y / f(ck) Ax fe , 


k = 1 


is discussed in Section 7.1, where the following definition is alluded to. 
Riemann’s Original Definition of the Integral: A bounded function / is 

integrable on [a, b] with f^f = A if for all e > 0 there exists a S > 0 such that 
for any tagged partition (P, }) satisfying Axk < S for all fc, it follows that 

\R(f,P)-A\ <e. 

Show that if / satisfies Riemann’s definition above, then / is integrable in the 
sense of Definition 7.2.7. (The full equivalence of these two characterizations of 
integrability is proved in Section 8.1.) 


Exercise 7.2.7. Let / : [a, b\ R be increasing on the set [a, b] (i.e., f(x) < 
f(y ) whenever x <y). Show that / is integrable on [a, b\. 
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7.3 Integrating Functions with Discontinuities 


The fact that continuous functions are integrable is not so much a fortunate 
discovery as it is evidence for a well-designed integral. Riemann’s integral is a 
modification of Cauchy’s definition of the integral, and Cauchy’s definition was 
crafted specifically to work on continuous functions. The interesting issue is 
discovering just how dependent the Riemann integral is on the continuity of the 
integrand. 


Example 7.3.1. Consider the function 



1 for x / 1 
0 for x = 1 


on the interval [0,2]. If P is any partition of [0,2], a quick calculation reveals 
that [/(/, P) = 2. The lower sum L(/, P) will be less than 2 because any 
subinterval of P that contains x = 1 will contribute zero to the value of the 
lower sum. The way to show that / is integrable is to construct a partition that 
minimizes the effect of the discontinuity by embedding x = 1 into a very small 
subinterval. 

Let e > 0, and consider the partition P e = {0, 1 — e/3, 1 + e/3, 2}. Then, 

L(fXe) = l(l-|)+0(e) + l(l-|) 



Because [/(/, P e ) = 2, we have 

U(f,P € )-L(f,P e ) = ^e<e. 

We can now use Theorem 7.2.8 to conclude that / is integrable. 


Although the function in Example 7.3.1 is extremely simple, the method 
used to show it is integrable is really the same one used to prove that any 
bounded function with a single discontinuity is integrable. The notation in the 
following proof is more cumbersome, but the essence of the argument is that the 
misbehavior of the function at its discontinuity is isolated inside a particularly 
small subinterval of the partition. 


Theorem 7.3.2. If f : [a, b\ -V R is bounded and f is integrable on [c, b] for all 
c G (a,6) ; then f is integrable on [a, b\. An analogous result holds at the other 
endpoint . 


Proof. Let e > 0. As usual, our task is to produce a partition P such that 
U(f, P) — L(f,P) < e. For any partition, we can always write 


n 

U(f, P) — L(f, P) = Y.( M k-m k ) Ax fc 

k = 1 

n 

= {M 1 - m 1 ){xi - a) + YXM k - m k )Ax k , 

k = 2 
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so the first step is to choose x\ close enough to a so that 


(Mi - mi)(x\ - a) < -. 

This is not too difficult. Because / is bounded, we know there exists M > 0 
satisfying \f(x)\ < M for all x E [a, 5]. Noting that M\ — mrt\ < 2 M, let’s pick 
xi so that 

c 

x\ — a < 


4 M 


Now, by hypothesis, / is integrable on [aq,b], so there exists a partition Pi of 
aq, 6] for which 

[/(/.PO-L^PO < 

Finally, we let P = {a} U Pi be a partition of [a, 5], from which it follows 
that 


P(/,P)-P(/,P) 


< 

< 


(2M)(xi-a) + (P(/,Pi)-P(/,Pi)) 


e e 



□ 


Theorem 7.3.2 enables us to prove that a bounded function on a closed 
interval with a single discontinuity at an endpoint is still integrable. In the 
next section, we will prove that integrability on the intervals [a, b] and [5, d] 
is equivalent to integrability on [a, d]. This property, together with an induc- 
tion argument, leads to the conclusion that any function with a finite number 
of discontinuities is still integrable. What if the number of discontinuities is 
infinite? 


Example 7.3.3. Recall Dirichlet’s function 

( \ / 1 for x rational 

9\ x ) | q f or x i rra tional 

from Section 4.1. If P is some partition of [0, 1], then the density of the rationals 
in R implies that every subinterval of P will contain a point where g{x) = 1. It 
follows that U(g, P) = 1. On the other hand, L(^, P) = 0 because the irrationals 
are also dense in R. Because this is the case for every partition P, we see that 
the upper integral U(f) = 1 and the lower integral L(f) = 0. The two are not 
equal, so we conclude that Dirichlet’s function is not integrable. 

How discontinuous can a function be before it fails to be integrable? Before 
jumping to the hasty (and incorrect) conclusion that the Riemann integral fails 
for functions with more than a finite number of discontinuities, we should realize 
that Dirichlet’s function is discontinuous at every point in [0,1]. It would be 
useful to investigate a function where the discontinuities are infinite in number 
but do not necessarily make up all of [0,1]. Thomae’s function, also defined 
in Section 4.1, is one such example. The discontinuous points of this function 
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are precisely the rational numbers in [0, 1]. In the exercises to follow we will 
see that Thomae’s function is Riemann-integrable, raising the bar for allowable 
discontinuous points to include potentially infinite sets. 

The conclusion of this story is contained in the doctoral dissertation of Henri 
Lebesgue, who presented his work in 1901. Lebesgue’s elegant criterion for 
Riemann integrability is explored in great detail in Section 7.6. For the moment, 
though, we will take a short detour from questions of integrability and construct 
a proof of the celebrated Fundamental Theorem of Calculus. 

Exercises 

Exercise 7.3.1. Consider the function 

f 1 for 0 < £ < 1 
h ^ = \ 2 for x = \ 


over the interval [0, 1]. 

(a) Show that L(/, P) = 1 for every partition P of [0, 1]. 

(b) Construct a partition P for which [/(/, P) < 1 + 1/10. 

(c) Given e > 0, construct a partition P e for which U (/, P e ) < 1 + e. 
Exercise 7.3.2. Recall that Thomae’s function 

( 1 if x = 0 

t(x) = < 1/n if x = m/n £ Q\{0} is in lowest terms with n > 0 

[ 0 if x </ Q 

has a countable set of discontinuities occurring at precisely every rational num- 

ber. Follow these steps to prove t(x) is integrable on [0, 1] with t = 0. 

(a) First argue that L(t, P) = 0 for any partition P of [0, 1]. 

(b) Let e > 0, and consider the set of points D e / 2 = {x E [0, 1] : t(x) > e/2}. 
How big is D e / 2 ? 

(c) To complete the argument, explain how to construct a partition P e of [0, 1] 
so that U (£, P e ) < e. 

Exercise 7.3.3. Let 

\ _ / 1 if x = 1/n for some n E N 
J\ x ) | q otherwise. 

Show that / is integrable on [0, 1] and compute f Q /. 

Exercise 7.3.4. Let / and g be functions defined on (possibly different) closed 
intervals, and assume the range of / is contained in the domain of g so that the 
composition g o f is properly defined. 
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(a) Show, by example, that it is not the case that if / and g are integrable, 
then g o f is integrable. 

Now decide on the validity of each of the following conjectures, supplying 
a proof or counterexample as appropriate. 

(b) If / is increasing and g is integrable, then g o f is integrable. 

(c) If / is integrable and g is increasing, then g o / is integrable. 


Exercise 7.3.5. Provide an example or give a reason why the request is im- 
possible. 


(a) A sequence (/ n ) f pointwise, where each f n has at most a finite number 
of discontinuities but / is not integrable. 

(b) A sequence (g n ) g uniformly where each g n has at most a finite number 
of discontinuities and g is not integrable. 


(c) A sequence (h n ) — )• h uniformly where each h n is not integrable but h is 
integrable. 

Exercise 7.3.6. Let {rq, 7 * 2 , 7 * 3 , . . .} be an enumeration of all the rationals in 
[0,1], and define 


n (X) = ! 1 ifX=T - 

\ 0 otherwise. 

(a) Is G{pc) = 9n( x ) integrable on [0, 1]? 

(b) Is F(x) = J2Z 1 g n (x)/n integrable on [0, 1]? 


Exercise 7.3.7. Assume / : [a, b\ R is integrable. 



Show that if g satisfies g(x) = f(x) for all but a finite number of points 
in [a, 6], then g is integrable as well. 



Find an example to show that g may fail to be integrable if it differs from 
/ at a countable number of points. 


Exercise 7.3.8. As in Exercise 7.3.6, let {ri,7*2,r3, . . .} be an enumeration of 
the rationals in [0, 1], but this time define 


( 1 if r n < x < 1 
y 0 if 0 < x < r n . 


Show H(x) = ^n(^)/2 n is integrable on [0, 1] even though it has discon 

tinuities at every rational point. 
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Exercise 7.3.9 (Content Zero). A set A C [a, b] has content zero if for every 
e > 0 there exists a finite collection of open intervals {Oi, O 2 , . . . , Oat} that 
contain A in their union and whose lengths sum to e or less. Using \O n \ to refer 
to the length of each interval, we have 


N 


N 


A C (J O n 


and 


n= 1 


El o 

n — 1 


< e. 


(a) Let / be bounded on [a, b]. Show that if the set of discontinuous points of 
/ has content zero, then / is integrable. 

(b) Show that any finite set has content zero. 

(c) Content zero sets do not have to be finite. They do not have to be count- 
able. Show that the Cantor set C defined in Section 3.1 has content zero. 


(d) Prove that 

1 if x G C 
0 if x £ C. 

is integrable, and find the value of the integral. 


h(x) = 


7.4 Properties of the Integral 

Before embarking on the proof of the Fundamental Theorem of Calculus, we 
need to verify what are probably some very familiar properties of the integral. 
The discussion in the previous section has already made use of the following 
fact. 

Theorem 7.4.1. Assume f : [a, b\ —> R is bounded , and let c E (a, b). Then, 


f is integrable on 
case, we have 


a, b\ if and only if f is integrable on [a, c] and [c, b \ . In this 


/= / /+ / /• 


a 


a 


Proof. If / is integrable on [a, b], then for e > 0 there exists a partition P such 
that U(f, P) - L(f, P) < e. Because refining a partition can only potentially 
bring the upper and lower sums closer together, we can simply add c to P if 
it is not already there. Then, let Pi = P D [a, c\ be a partition of [a, c], and 
P 2 = P D [c, b\ be a partition of [c, b\. It follows that 

U(f , Pi) - L(f , Pi) < e and U(f , P 2 ) - L(f , P 2 ) < e, 

implying that / is integrable on [a,c] and [c, b\. 

Conversely, if we are given that / is integrable on the two smaller intervals 
a, c] and [c, 6], then given an e > 0 we can produce partitions Pi and P 2 of [a, c 
and [c, b], respectively, such that 


U(f,P 1 )-L(f,P 1 )<^ and U(f,P 2 )-L(f,P 2 )< 
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Letting P = Pi U P 2 produces a partition of [a, b } for which 

U(f,P) — L(f,P) < e. 

Thus, / is integrable on [a, b\. 

Continuing to let P = P\ U P 2 as earlier, we have 


f f<U(f,P) < L(f,P)+e 

J a 


= L(f,P 1 )+L(f,P 2 ) + e 
< I f + I" f + e, 


a 


which implies j f < / c / + J /• To get the other inequality, observe that 


f+ f < U(f,P 1 ) + U(f,P 2 ) 


a 


< L(f,P 1 ) + L(f,P 2 )+e 
= L(f,P) + e 


< 


/ + e. 


a 


Because e > 0 is arbitrary, we must have f + f < fa /> 80 


•C rb pb 

f+ f= f, 

a J c J a 


as desired. 


□ 


The proof of Theorem 7.4.1 demonstrates some of the standard techniques 
involved for proving facts about the Riemann integral. The next result catalogs 
the remainder of the basic properties of the integral that we will need in our 
upcoming arguments. 

Theorem 7.4.2. Assume f and g are integrable functions on the interval [a, b\. 

(i) The function f + g is integrable on [a, b\ with J^(f + g) = f + j ^ g. 

(ii) For k G R, the function kf is integrable with f Q ' /■•/ /• 

(iii) If m < f(x) < M on [a, b\ , then m(b — a) < f < M(b — a). 

(iv) If f(x) < g(x) on [a, 6], then f*f < f*g. 

(v) The function \ f\ is integrable and \ If 1 < r i/i- 
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Proof. Properties (i) and (ii) are reminiscent of the Algebraic Limit Theorem 
and its many descendants (Theorems 2.3.3, 2.7.1, 4.2.4, and 5.2.4). In fact, 
there is a way to use the Algebraic Limit Theorem for this argument as well. 
An immediate corollary to Theorem 7.2.8 is that a function / is integrable on 
a, b } if and only if there exists a sequence of partitions (P n ) satisfying 



lim [U(f,P n )-L(f,P n )} = 0, 


n— ^ OO 


and in this case f^f = limZ7(/, P n ) = limL(/, P n ). (A proof for this was 
requested as Exercise 7.2.3.) 

To prove (ii) for the case k > 0, first verify that for any partition P we have 
U{kf , P) = kU(f , P) and L(kf, P) = fcL(/, P). 


Exercise 1.3.5 is used here. Because / is integrable, there exist partitions (P n ) 
satisfying (1). Turning our attention to the function (fc/), we see that 

lim [U (kf, P n ) — L(kf, P n )\ = lim k [£/(/, P n ) - L(/, P n )\ = 0, 


n— >■ ex) 


n— »- oo 


and the formula in (ii) follows. The case where k < 0 is similar except that we 
have 

U{kf , Pn) = kL(f , P n ) and Ukf , P n ) = *£/(/, P n ). 

A proof for (i) can be constructed using similar methods and is requested in 
Exercise 7.4.5. 

To prove (iii), observe that 


U(f,P) > [ f>L(f,P) 

J a 


for any partition P. Statement (iii) follows if we take P to be the trivial partition 
consisting of only the endpoints a and b. 

For (iv), let h = g — f and use (i), (ii), and (iii). 

Because —\f(x)\ < f(x) < \f(x)\ on [a, 6], statement (v) will follow from (iv) 
provided that we can show that |/| is actually integrable. The proof of this fact 
is outlined in Exercise 7.4.1. □ 


To this point, the quantity f a f is only defined in the case where a < b. 
Definition 7.4.3. If / is integrable on the interval [a, b], define 


*a 


/ 


/• 


a 


Also, for c G [a, 6] define 
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Definition 7.4.3 is a natural convention to simplify the algebra of integrals. 
If / is an integrable function on some interval /, then it is straightforward to 
verify that the equation 

pb PC pb 

y f- f+ f 

from Theorem 7.4.1 remains valid for any three points a, b, and c chosen in any 
order from I. 


Uniform Convergence and Integration 


If (/ n ) is a sequence of integrable functions on [a, b], and if /' 
inevitably going to want to know whether 


/, then we are 


pb pb 

(2) / fn t / /• 

J a J a 

This is an archetypical instance of one of the major themes of analysis: When 
does a mathematical manipulation such as integration respect the limiting pro- 
cess? 

If the convergence is pointwise, then any number of things can go wrong. It 
is possible for each f n to be integrable but for the limit / not to be integrable 
(Exercise 7.3.5). Even if the limit function / is integrable, equation (2) may fail 
to hold. As an example of this, let 


j n if 0 < x < l/n 
\ 0 if x = 0 or x > 1 /n. 


Each f n has two discontinuities on [0, 1] and so is integrable with f Q f n = 1. 
For each x E [0, 1], we have lim f n (x) = 0 so that f n 0 pointwise on [0, 1]. 
But now observe that the limit function / = 0 certainly integrates to 0, and 

0 7 ^ lim [ f n . 

J 0 

As a final remark on what can go wrong in ( 2 ), we should point out that it is 

possible to modify this example to produce a situation where lim f Q f n does not 
even exist. 

One way to resolve all of these problems is to add the assumption of uniform 
convergence. 


Theorem 7.4.4 (Integrable Limit Theorem). Assume that f n f uni- 
formly on [a, b\ and that each f n is integrable. Then , f is integrable and 


lim 

n — >oo 





/• 
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Proof. The proof that / is integrable was requested as Exercise 7.2.5. The 
properties of the integral listed in Theorem 7.4.2 allow us to assert that for 
any /„, 


fn 


f 


a 


a 


(fn ~ f) 


a 


< I fn- f\- 


a 


Let c > 0 be arbitrary. Because f n — y f uniformly, there exists an N such that 
I fn (x) — f(x) | < e/(b — a) for all n > N and x G [a, b } . 

Thus, for n > N we see that 


fn~ f 


a 


a 


< 


I fn ~ f I 


a 


and the result follows. 


< 


a 


b — a 


□ 


Exercises 

Exercise 7.4.1. Let / be a bounded function on a set A, and set 

M = sup{/(x) : x G A}, m = inf {f(x) : x G A}, 

M' = sup{|/(x)| : x G A}, and m = inf{|/(x)| : x G A}. 

(a) Show that M — m > M' — ml . 

(b) Show that if / is integrable on the interval [a, 6], then |/| is also integrable 
on this interval. 

(c) Provide the details for the argument that in this case we have 1 1/1 < 

/a l/l- 

Exercise 7.4.2. (a) Let g(x) = x 3 , and classify each of the following as pos- 

itive, negative, or zero. 

p — 1 pi p 0 pi p — 2 pi 

(i) / 9+ 9 (ii) 9+ 9 (iii) / 9+ 9- 

Jo Jo J 1 Jo J 1 Jo 

(b) Show that if b < a < c and / is integrable on the interval [6, c] , then it is 
still the case that f^f = J ^ f . 

Exercise 7.4.3. Decide which of the following conjectures is true and supply 
a short proof. For those that are not true, give a counterexample. 


(a) If |/| is integrable on [a, 6], then / is also integrable on this set. 
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(b) Assume g is integrable and g(x) > 0 on [a, b\. If g(x) > 0 for an infinite 
number of points x E [a, b], then g > 0. 

(c) If 0 is continuous on [a, 6] and g(x) > 0 with g(yo) > 0 for at least one 
point ?/o C [<T &]? then f g > 0. 

Exercise 7.4.4. Show that if f(x) > 0 for all x E [a, b } and / is integrable, 
then g / > 0. 

Exercise 7.4.5. Let / and g be integrable functions on [a, b\. 

(a) Show that if P is any partition of [a, b], then 

tf(/ + 3,-P) <U(f,P) + U(g,P). 

Provide a specific example where the inequality is strict. What does the 
corresponding inequality for lower sums look like? 

(b) Review the proof of Theorem 7.4.2 (ii), and provide an argument for part 
(i) of this theorem. 

Exercise 7.4.6. Although not part of Theorem 7.4.2, it is true that the product 
of integrable functions is integrable. Provide the details for each step in the 
following proof of this fact: 

(a) If / satisfies \f(x)\ < M on [a, b], show 

l(/(*)) 2 - {f{y)) 2 \ < 2M\f(x) - f(y)\. 


(b) Prove that if / is integrable on [a, 6], then so is / 2 . 

(c) Now show that if / and g are integrable, then fg is integrable. (Consider 

(/ + £) 2 0 

Exercise 7.4.7. Review the discussion immediately preceding Theorem 7.4.4. 

(a) Produce an example of a sequence f n 0 pointwise on [0, 1] where 
lim n ^ 00 Jq 1 f n does not exist. 

(b) Produce an example of a sequence g n with g n 0 but g n {%) does not 
converge to zero for any x E [0, 1]. To make it more interesting, let’s insist 
that g n (x) > 0 for all x and n. 

Exercise 7.4.8. For each n E N, let 


h n {x) = 


l/2 n if l/2 n < x < 1 
0 if0<£< 1/2 


n 5 


and set H(x) = h n {x). Show H is integrable and compute L 1 H. 


234 


Chapter 7. The Riemann Integral 


Exercise 7.4.9. Let g n and g be uniformly bounded on [0,1], meaning that 
there exists a single M > 0 satisfying \g(x)\ < M and \g n (x)\ < M for all n G N 
and x G [0, 1]. Assume g n g pointwise on [0, 1] and uniformly on any set of 
the form [0, a], where 0 < a < 1. 

If all the functions are integrable, show that lim n ^ 00 


So 9n 


ft oh- 


'0 JO 

Exercise 7.4.10. Assume g is integrable on [0, 1] and continuous at 0. Show 

■l 


lim 

n— >■ oo 


g(x n )dx = g( 0). 


Exercise 7.4.11. Review the original definition of integrability in Section 7.2, 
and in particular the definition of the upper integral U(f). One reasonable sug- 
gestion might be to bypass the complications introduced in Definition 7.2.7 and 
simply define the integral to be the value of U (/). Then every bounded function 
is integrable! Although tempting, proceeding in this way has some significant 
drawbacks. Show by example that several of the properties in Theorem 7.4.2 no 
longer hold if we replace our current definition of integrability with the proposal 

that f = U(f ) for every bounded function /. 


7.5 The Fundamental Theorem of Calculus 


The derivative and the integral have been independently defined, each in its own 
rigorous mathematical terms. The definition of the derivative is motivated by 
the problem of finding slopes of tangent lines and is given in terms of functional 
limits of difference quotients. The definition of the integral grows out of the 
desire to calculate areas under nonconstant functions and is given in terms of 
supremums and infimums of finite sums. The Fundamental Theorem of Calculus 
reveals the remarkable inverse relationship between the two processes. 

The result is stated in two parts. The first is a computational statement 
that describes how an antiderivative can be used to evaluate an integral over 
a particular interval. The second statement is more theoretical in nature, ex- 
pressing the fact that every continuous function is the derivative of its indefinite 
integral. 


Theorem 7.5.1 (Fundamental Theorem of Calculus). (i) Iff ■ [a,b] 

R is integrable , and F : [a, b\ R satisfies F'(x) = f(x) for all x G [a, b\, 
then 


f f = F(b) — F(a). 

J a 


(ii) Let g : [a, b] R be integrable , and for x G [a, b\, define 


* X 


G(x) 


9 - 


a 


Then G is continuous on [a, b ] . If g is continuous at some point c G [a, b] 
then G is differentiable at c and G'(c ) = g(c). 
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Proof, (i) Let P be a partition of [a, b\ and apply the Mean Value Theorem to 
F on a typical subinterval [xk-i,Xk\ of P. This yields a point t^ G (xk-i,Xk) 
where 


F(x k ) - F(x k -i) = F'(t k )(x k-Xk-i) 

= f(tk)(x k -Xk- 1 ). 


Now, consider the upper and lower sums [/(/, P) and P(/, P). Because m k < 
f(tk) < Mk (where is the infimum on [xk-i,Xk\ and M & is the supremum), 
it follows that 


n 


L(f, P)<Y1 - F^k-i)} < U(f, P) 


k = l 


But notice that the sum in the middle telescopes so that 


n 


[P( x k) - Fixn-O] = F(b) - F(a), 


k = 1 


which is independent of the partition P. Thus we have 

L(f) < F(b) — F(a) < U(f). 

Because L(f) = U(f ) = /, we conclude that f^f = F(b) — F(a). 

(ii) To prove the second statement, take x > y in [a, b] and observe that 


\G(x)-G(y)\ = 


>x py 

9~ 9 

a J a 


>x 


y 

* X 


< 


9 


9 


y 


< M(x-y), 

where M > 0 is a bound on \g\. This shows that G is Lipschitz and so is 
uniformly continuous on [a, b] (Exercise 4.4.9). 

Now, let’s assume that g is continuous at c G [a, b\. In order to show that 
G'{c ) = g(c), we rewrite the limit for G'(c) as 


■i,„ 0 M ~ G < c > = 


* X 


X^c X — C 


X^rC x — C 


g(t ) dt — g(t ) dt 


a 


a 


— lim 


1 


>x 


X^c X — C 


g(t ) dt 


We would like to show that this limit equals g(c). Thus, given an e > 0, we 
must produce a 5 > 0 such that if x — c I < 5, then 


1 


x — c 


9(t ) dt) - g(c ) 


( 1 ) 


< e. 


236 


Chapter 7. The Riemann Integral 


The assumption of continuity of g gives us control over the difference \g(t)—g{c) 
In particular, we know that there exists a 8 > 0 such that 


t — c\ <8 implies | g(t) — g(c) \ < e. 


To take advantage of this, we cleverly write the constant g(c) as 


g{c) 


l 


* x 


X — C 


g{c) dt 


and combine the two terms in equation (1) into a single integral. Keeping in 


mind that 


x — c 


1 


x — c 


> It — cl, we have that for all \x — cl < 8, 


‘X 


git) dt - g{c) 


1 


'X 


X — C 


(d(t) -g(c))dt 


< 


< 


1 


* X 


(x — C ) 

1 

(x — c ) 


I g(t) - g(c)\dt 


‘X 


e dt = e. 


□ 


Exercises 


Exercise 7.5.1. (a) Let f{x) = \x\ and define F{x) = f_ 1 f. Find a piece- 

wise algebraic formula for F(x) for all x. Where is F continuous? Where 
is F differentiable? Where does F'(x) = f(x)? 


(b) Repeat part (a) for the function 


/(a '■) 


1 if x < 0 

2 if x > 0. 


Exercise 7.5.2. Decide whether each statement is true or false, providing a 
short justification for each conclusion. 

(a) If h' = g on [a, 6], then g is continuous on [a, b\. 

(b) If g is continuous on [a, 6], then g = h' for some h on [a, b\. 

(c) If H (x) = h is differentiable at c E [a, 6], then h is continuous at c. 

Exercise 7.5.3. The hypothesis in Theorem 7.5.1 (i) that F'(x) = f(x) for all 
x G [a, b] is slightly stronger than it needs to be. Carefully read the proof and 
state exactly what needs to be assumed with regard to the relationship between 
/ and F for the proof to be valid. 

Exercise 7.5.4. Show that if / : [a, b\ R is continuous and ff / = 0 for all 
x G [a, 6], then f(x) = 0 everywhere on [a, b\. Provide an example to show that 
this conclusion does not follow if / is not continuous. 
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Exercise 7.5.5. The Fundamental Theorem of Calculus can be used to supply 
a shorter argument for Theorem 6.3.1 under the additional assumption that the 
sequence of derivatives is continuous. 

Assume f n f pointwise and f' n ^rg uniformly on [a, b\. Assuming each 
f' n is continuous, we can apply Theorem 7.5.1 (i) to get 



= fn(x) ~ fn(a) 


for all x G [a, b\. Show that g(x) 



Exercise 7.5.6 (Integration- by-parts). (a) Assume h{pc) and k{pc) have 
continuous derivatives on [a, b\ and derive the familiar integration-by-parts 
formula 



h(t)k f (t)dt = h(b)k(b) — h(a)k(a) — 



(b) Explain how the result in Exercise 7.4.6 can be used to slightly weaken 
the hypothesis in part (a). 

Exercise 7.5.7. Use part (ii) of Theorem 7.5.1 to construct another proof of 
part (i) of Theorem 7.5.1 under the stronger hypothesis that / is continuous. 
(To get started, set G(x) = J ^ /.) 

Exercise 7.5.8 (Natural Logarithm and Euler’s Constant). Let 




- dt 
t 


where we consider only x > 0 . 

(a) What is L(l)? Explain why L is differentiable and find L'(x). 

(b) Show that L{xy) = L{x)-\-L{y). (Think of y as a constant and differentiate 
g(x) = L(xy).) 

(c) Show L(x/y ) = L(x) — L(y). 

(d) Let 

Prove that (y n ) converges. The constant 7 = lim 7 n is called Euler’s 
constant. 

(e) Show how consideration of the sequence 72 n — 7 n leads to the interesting 
identity 


11111 
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Exercise 7.5.9. Given a function / on [a, b], define the total variation of / 
to be 


n 


F/ = Slip < ^ I f(x k ) - f(Xk-l) 


k = 1 


where the supremum is taken over all partitions P of [a, b\. 

(a) If / is continuously differentiable ( f ' exists as a continuous function), use 
the Fundamental Theorem of Calculus to show V f < i\n 

(b) Use the Mean Value Theorem to establish the reverse inequality and con- 
clude that V f = f b \f'\. 


Exercise 7.5.10 (Change-of-variable Formula). Let g : [a, b] R be dif- 
ferentiable and assume g' is continuous. Let / : [c, d\ —> R be continuous, and 
assume that the range of g is contained in [c, d\ so that the composition / o g is 
properly defined. 


(a) Why are we sure / is the derivative of some function? How about ( f°g)g ' ? 

(b) Prove the change-of-variable formula 



f(g(x))g'(x)dx 



Exercise 7.5.11. Assume / is integrable on [a, b] and has a “jump discontinu- 
ity” at c G (a, b). This means that both one-sided limits exist as x approaches 
c from the left and from the right, but that 


lim f(x) 7^ lim f(x). 

X^rC~ X — >-C + 

(This phenomenon is discussed in more detail in Section 4.6.) 

(a) Show that, in this case, F(x) = is not differentiable at x = c. 

(b) The discussion in Section 5.5 mentions the existence of a continuous mono- 
tone function that fails to be differentiable on a dense subset of R. Com- 
bine the results of part (a) with Exercise 6.4.10 to show how to construct 
such a function. 


7.6 Lebesgue’s Criterion for Riemann 
Integrability 

We now return to our investigation of the relationship between continuity and 
the Riemann integral. We have proved that continuous functions are integrable 
and that the integral also exists for functions with only a finite number of discon- 
tinuities. At the opposite end of the spectrum, we saw that Dirichlet’s function, 
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which is discontinuous at every point on [0, 1], fails to be Riemann-integrable. 
The next examples show that the set of discontinuities of an integrable func- 
tion can be infinite and even uncountable. (These also appear as exercises in 
Section 7.3.) 


Riemann-integrable Functions with Infinite Discontinuities 

Recall from Section 4.1 that Thomae’s function 

f 1 if x = 0 

t(x) = < 1/n if x = m/n £ Q\{0} is in lowest terms with n > 0 

[ 0 if x Q 

is continuous on the set of irrationals and has discontinuities at every rational 
point. Let’s prove that Thomae’s function is integrable on [0, 1] with f* t = 0. 

Let e 0. The strategy, as usual, is to construct a partition L} of [0, lj for 
which U (£, P e ) — L(t , P e ) < e. 

Exercise 7.6.1. (a) First, argue that L(t, P) = 0 for any partition P of [0, 1]. 

(b) Consider the set of points D e / 2 = {x : t(pc) > e/2}. How big is D e / 2 ? 

(c) To complete the argument, explain how to construct a partition P e of [0, 1] 
so that U (£, P € ) < e. 


We first met the Cantor set C in Section 3.1. We have since learned that C 
is a compact, uncountable subset of the interval [0, 1]. 

Exercise 7.6.2. Define 



1 if x G C 
0 if x (/ C 


(a) Show h has discontinuities at each point of C and is continuous at every 
point of the complement of C . Thus, h is not continuous on an uncount- 
ably infinite set. 

(b) Now prove that h is integrable on [0, 1]. 


Sets of Measure Zero 

Thomae’s function fails to be continuous at each rational number in [0,1]. 
Although this set is infinite, we have seen that any infinite subset of Q is count- 
able. Countably infinite sets are the smallest type of infinite set. The Cantor 
set is uncountable, but it is also small in a sense that we are now ready to make 
precise. In the introduction to Chapter 3, we presented an argument that the 
Cantor set has zero “length.” The term “length” is awkward here because it 
really should only be applied to intervals or finite unions of intervals, which the 
Cantor set is not. There is a generalization of the concept of length to more 
general sets called the measure of a set. Of interest to our discussion are subsets 
that have measure zero. 
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Definition 7.6.1. A set A C R has measure zero if, for all e > 0, there exists a 
countable collection of open intervals O n with the property that A is contained 
in the union of all of the intervals O n and the sum of the lengths of all of the 
intervals is less than or equal to e. More precisely, if \O n \ refers to the length of 
the interval O n , then we have 

oo oo 

A C O n and \O n \ < e. 

n = 1 n — 1 

Example 7.6.2. Consider a finite set A = {ai,a 2 , . . . , a^v}. To show that A 
has measure zero, let e > 0 be arbitrary. For each 1 < n < N, construct the 
interval 

G n= (an- + • 

Clearly, A is contained in the union of these intervals, and 

N N 

E = E jv = e 

n — 1 n—1 

Exercise 7.6.3. Show that any countable set has measure zero. 

Exercise 7.6.4. Prove that the Cantor set has measure zero. 

Exercise 7.6.5. Show that if two sets A and B each have measure zero, then 
A U B has measure zero as well. In addition, discuss the proof of the stronger 
statement that the countable union of sets of measure zero also has measure 
zero. (This second statement is true, but a completely rigorous proof requires 
a result about double summations discussed in Section 2.8.) 


a - Continuity 

Definition 7.6.3. Let / be defined on [a, 6], and let a > 0. The function / is 
a-continuous at x E [a, b\ if there exists S > 0 such that for all y,z E (x — S, x+S) 
it follows that | f(y) — f (z) \ < a. 

Let / be a bounded function on [a, b\. For each a > 0, define D a to be the 
set of points in [a, b] where the function / fails to be a-continuous; that is, 


(i) 


D a = {x G [a, b] : f is not a-continuous at x.} 


The concept of a-continuity was previously introduced in Section 4.6. Several 
of the ensuing exercises appeared as exercises in this section as well. 

Exercise 7.6.6. If a < a 7 , show that D a C D a . 


Now, let 


( 2 ) 


D = {x G [a, b\ : / is not continuous at x }. 
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Exercise 7.6.7. (a) Let a > 0 be given. Show that if / is continuous at 

x G [a, 6], then it is a-continuous at x as well. Explain how it follows that 
D a C D. 


(b) Show that if / is not continuous at x, then / is not a-continuous for some 
a > 0. Now, explain why this guarantees that 

oo 

D= U D an where a n = 1/n. 

n — 1 


Exercise 7.6.8. Prove that for a fixed a > 0, the set D a is closed. 


Just as with continuity, a-continuity is defined pointwise, and just as with 
continuity, uniformity is going to play an important role. 

For a fixed a > 0, a function f : A —> R is uniformly a -continuous on A 
if there exists a S > 0 such that whenever x and y are points in A satisfying 
x — y\ < 6 , it follows that | f(x) — f(y)\ < ol. By imitating the proof of 
Theorem 4.4.7, it is completely straightforward to show that if / is a-continuous 
at every point on some compact set K, then / is uniformly a-continuous on K. 


Compactness Revisited 

Compactness of subsets of the real line can be described in three equivalent 
ways. The following theorem appears toward the end of Section 3.3. 

Theorem 7.6.4. Let K C R. The following three statements are all equivalent, 
in the sense that if any one is true, then so are the two others. 

(i) Every sequence contained in K has a convergent subsequence that con- 
verges to a limit in K . 

(ii) K is closed and bounded. 

(iii) Given a collection of open intervals {Ga : A e A} that covers K (that is, 
K C UagA there exists a finite subcollection {G \ 1 , Ga 2 , G \ 3 , . . . , G\ N } 
of the original set that also covers K . 

The equivalence of (i) and (ii) has been used throughout the core material 
in the text. Characterization (iii) has been less central but is essential to the 
upcoming argument. If the characterization of compactness in terms of open 
covers is not familiar, take a moment to review the second half of Section 3.3 
and complete the proof that (i) and (ii) imply (iii) outlined in Exercise 3.3.9. 

Lebesgue’s Theorem 

We are now prepared to completely categorize the collection of Riemann- 
integrable functions in terms of continuity. 


242 


Chapter 7. The Riemann Integral 


Theorem 7.6.5 (Lebesgue’s Theorem). Let f be a bounded function defined 
on the interval [a, b] . Then , f is Riemann- int eg rable if and only if the set of 
points where f is not continuous has measure zero. 


Proof. Let M > 0 satisfy \f(x)\ < M for all x E [a, 6], and let D and D a be 
defined as in the preceding equations (1) and (2). Let’s first assume that D has 
measure zero and prove that our function is integrable. 

(<=) Let e > 0 and set 


a 


2(6 — a) 


Exercise 7.6.9. Show that there exists a finite collection of disjoint open in- 
tervals {Gi, G 2 , . . . , Gjv} whose union contains D a and that satisfies 


N 


El* 

n — 1 


< 


4 M 


Exercise 7.6.10. Let K be what remains of the interval [a, b] after the open 


intervals G n are all removed; that is, K = [a,6]\|J 
uniformly a - continuous on K. 


N 

n — 1 


G n . Argue that / is 


Exercise 7.6.11. Finish the proof in this direction by explaining how to con- 
struct a partition P e of [a, b] such that U (/, P € ) — L(/, P e ) < e. It will be helpful 
to break the sum 


n 

U(f, P e ) - L(f, P e ) = ^(M fc - m k ) Ax k 

k = 1 

into two parts — one over those subintervals that contain points of D a and the 
other over subintervals that do not. 


(=>) For the other direction, assume / is Riemann-integrable. We must argue 
that the set D of discontinuities of / has measure zero. 

Let e > 0 be arbitrary, and fix ex 0. Because f is Riemann-mtegrable, 
there exists a partition P e of [a, b] such that [/(/, P € ) — L(f , P € ) < ae. 


Exercise 7.6.12. (a) Prove that D a has measure zero. Point out that it is 

possible to choose a cover for D a that consists of a finite number of open 
intervals. 

(b) Show how this implies that D has measure zero. ^ 

Our main agenda in the remainder of this section is to employ Lebesgue’s 
Theorem in our pursuit of a non- integrable derivative, but this elegant result 
has a number of other applications. 


Exercise 7.6.13. (a) Show that if / and g are integrable on [a, 6], then so is 

the product fg. (This result was requested in Exercise 7.4.6, but notice 
how much easier the argument is now.) 
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(b) Show that if g is integr able on [a, b] and / is continuous on the range of 
g , then the composition / o g is integrable on [a, b\. 


If we instead assume that / is integrable and g is continuous, it actually 
doesn’t follow that the composition / o g is an integrable function. Producing a 
counterexample, however, requires a few more ingredients. 


A Nonintegrable Derivative 


To this point, our one example of a nonintegrable function is Dirichlet’s nowhere- 
continuous function. We close this section with another example that has special 
significance. The content of the Fundamental Theorem of Calculus is that inte- 
gration and differentiation are inverse processes of each other. If a function / is 
differentiable on [a, b\, then part (i) of the Fundamental Theorem tells us that 


( 3 ) f r = m - /(«) . 

J a 

provided f' is integrable. But shouldn’t f be integrable just by virtue of being 
a derivative? A curious side-effect of staring at equation (3) for any length of 
time is that it starts to feel as though every derivative should be integrable 
because we have an obvious candidate for what the value of the integral ought 
to be. Alas, for the Riemann integral at least, reality comes up short of our 
expectations. What follows is the construction of a differentiable function / for 
which equation (3) fails because J ^ f does not exist. 

We will once again be interested in the Cantor set 


c=f]c n , 

n — 0 


defined in Section 3.1. As an initial step, let’s create a function f(x) that is 
differentiable on [0, 1] and whose derivative f'(x) has discontinuities at every 
point of C. The key ingredient for this construction is the function 

, x_ J x 2 sin(l/x) if x > 0 
9[ - X > - \ 0 if X < 0. 


Exercise 7.6.14. (a) Find g'( 0). 

(b) Use the standard rules of differentiation to compute g'(x) for r /0. 

(c) Explain why, for every 5 > 0, g'(x) attains every value between 1 and — 1 
as x ranges over the set (—5,5). Conclude that g' is not continuous at 
x = 0. 


Now, we want to transport the behavior of g around zero to each of the end- 
points of the closed intervals that make up the sets C n used in the definition of 
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the Cantor set. The formulas are awkward but the basic idea is straightforward. 
Start by setting 

fo{x)=0 on Co = [0,1]. 

To define fi on [0, 1], first assign 


fi(x) = 0 for all x e Ci 


r n 


'2 

1 

0 

VJ 

00 1 

u 

3 ’ 1 


In the remaining open middle third, put translated “copies” of g oscillating 
toward the two endpoints (Fig. 7.3). In terms of a formula, we have 




fl (x) = < 


0 

g(x - 1/3) 
g(-x + 2/3) 

0 


if x G [0, 1/3] 

if x is just to the right of 1/3 
if x is just to the left of 2/3 
if x G [2/3,1] . 


Finally, we splice the two oscillating pieces of fi together in a way that makes 
fi differentiable and such that 


fi(x)\ < (x — 1/3) 2 and 


fl(x) 


< (-X + 2/3) 2 . 


This splicing is no great feat, and we will skip the details so as to keep our 
attention focused on the two endpoints 1/3 and 2/3. These are the points 
where f[(x) fails to be continuous. 

To define ^(x), we start with fi(x) and do the same trick as before, this 
time in the two open intervals (1/9, 2/9) and (7/9, 8/9). The result (Fig. 7.4) 
is a differentiable function that is zero on C 2 and has a derivative that is not 
continuous on the set 

(1 2 1 2 7 8) 

\ 9 ’ 9’ 3 ’ 3 ’ 9 ’ 9 J 


Continuing in this fashion yields a sequence of functions /o, / 1 , / 2 , • • • defined 
on [0, 1]. 
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Exercise 7.6.15. (a) If c G C, what is lim n ^ 00 / n (c)? 

(b) Why does lim n ^ 00 f n (x) exist for x £ Cl 
Now, set 

f(x) = lim f n (x). 

n— >■ oo 

Exercise 7.6.16. (a) Explain why f'(x) exists for all x ^ C . 

(b) if c G C, argue that |/(x)| < (x — c ) 2 for all x G [0, 1]. Show how this 
implies f'(c ) = 0. 

(c) Give a careful argument for why f'(x) fails to be continuous on C . Re- 
member that C contains many points besides the endpoints of the intervals 
that make up Ci, C2, C3, . . . . 

Let’s take inventory of the situation. Our goal is to create a nonintegrable 
derivative. Our function f(x) is differentiable, and f fails to be continuous on 
C . We are not quite done. 

Exercise 7.6.17. Why is f' (x) Riemann- integr able on [0, 1]? 

The reason the Cantor set has measure zero is that, at each stage, 2 n_1 open 
intervals of length l/3 n are removed from C n -\. The resulting sum 



converges to one, which means that the approximating sets Ci, C2, C3, . . . have 
total lengths tending to zero. Instead of removing open intervals of length l/3 n 
at each stage, let’s see what happens when we remove intervals of length l/3 n+1 . 

Exercise 7.6.18. Show that, under these circumstances, the sum of the lengths 
of the intervals making up each CL no longer tends to zero as n 00 . What is 
this limit? 
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Figure 7.5: A differentiable function with a non-integrable 

DERIVATIVE. 


If we again take the intersection H^Lo the resu ^ is a Cantor-type set with 
the same topological properties — it is closed, compact, perfect, and contains 
no intervals. But a consequence of the previous exercise is that it no longer 
has measure zero. This is just what we need to define our desired function. 
By repeating the preceding construction of f(x) on this new Cantor- type set 
of strictly positive measure, we get a differentiable function whose derivative 
has too many points of discontinuity (Fig. 7.5). By Lebesgue’s Theorem, this 
derivative cannot be integrated using the Riemann integral. 

Exercise 7.6.19. As a final gesture, provide the example advertised in Exer- 
cise 7.6.13 of an integrable function / and a continuous function g where the 
composition fog is properly defined but not integrable. Exercise 4.3.12 may 
be useful. 


7.7 Epilogue 

Riemann’s definition of the integral was a modification of Cauchy’s integral, 
which was originally designed for the purpose of integrating continuous func- 
tions. In this goal, the Riemann integral was a complete success. For continuous 
functions at least, the process of integration now stood on its own rigorous foot- 
ing, defined independently of differentiation. As analysis progressed, however, 
the dependence of integrability on continuity became problematic. The last 
example of Section 7.6 highlights one type of weakness: not every derivative 
can be integrated. Another limitation of the Riemann integral arises in asso- 
ciation with limits of sequences of functions. To get a sense of this, let’s once 
again consider Dirichlet’s function g(x) introduced in Section 4.1. Recall that 
g(x) = 1 whenever x is rational, and g(x) = 0 at every irrational point. Focusing 
on the interval [0, 1] for a moment, let 


Fi, ?* 2 , r 3 , r 4 . . .} 
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be an enumeration of the countable number of rational points in this interval. 
Now, let gi(x) = 1 if x = rq and define gi(x) = 0 otherwise. Next, define 
g 2 (x) = 1 if x is either rq or rq, and let g 2 {x) = 0 at all other points. In general, 
for each n G N, define 

_ M _ / 1 if x e {r*i, r 2 , . . . , r„} 
y 0 otherwise. 


Notice that each g n has only a finite Number of discontinuities and so is Riemann- 
integrable with g n = 0. But we also have g n g pointwise on the 
interval [0, 1]. The problem arises when we remember that Dirichlet’s nowhere- 
continuous function is not Riemann-integrable. Thus, the equation 





9 


fails to hold, not because the values on each side of the equal sign are different 
but because the value on the right-hand side does not exist. The content of The- 
orem 7.4.4 is that this equation does hold whenever we have g n g uniformly. 
This is a reasonable way to resolve the situation, but it is a bit unsatisfying 
because the deficiency in this case is not entirely with the type of convergence 
but lies in the strength of the Riemann integral. If we could make sense of the 
right-hand side via some other definition of integration, then maybe equation 
(1) would actually be true. 

Such a definition was introduced by Henri Lebesque in 1901. Generally 
speaking, Lebesgue’s integral is constructed using a generalization of length 
called the measure of a set. In the previous section, we studied sets of measure 
zero. In particular, we showed that the rational numbers in [0,1] (because they 
are countable) have measure zero. The irrational numbers in [0,1] have measure 
one. This should not be too surprising because we now have that the measures 
of these two disjoint sets add up to the length of the interval [0,1]. Rather 
than chopping up the x-axis to approximate the area under the curve, Lebesgue 
suggested partitioning the y- axis. In the case of Dirichlet’s function g , there 
are only two range values — zero and one. The integral, according to Lebesgue, 
could be defined via 



1 • [measure of set where g = 1] + 0 • [measure of set where g = 0] 


1-0 + 0-1 = 0 . 


With this interpretation of f* g, equation (1) is now valid! 

The Lebesgue integral is presently the standard integral in advanced math- 
ematics. The theory is taught to all graduate students, as well as to many 
undergraduates, and it is the integral used in most research papers where inte- 
gration is required. The Lebesgue integral generalizes the Riemann integral in 
the sense that any function that is Riemann-integrable is Lebesgue-integrable 
and integrates to the same value. The real strength of the Lebesgue integral 
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is that the class of integrable functions is much larger. Most importantly, this 
class includes the limits of different types of Cauchy sequences of integrable 
functions. This leads to a group of extremely important convergence theorems 
related to equation (1) with hypotheses much weaker than the uniform conver- 
gence assumed in Theorem 7.4.4. 

Despite its prevalence, the Lebesgue integral does have a few drawbacks. 
There are functions whose improper Riemann integrals exist but that are not 
Lebesgue-integrable. Another disappointment arises from the relationship be- 
tween integration and differentiation. Even with the Lebesgue integral, it is still 
not possible to prove 

r = m- /(a) 

J a 

without some additional assumptions on /. Around 1960, a new integral was 
proposed that can integrate a larger class of functions than either the Riemann 
integral or the Lebesgue integral and suffers from neither of the preceding 
weaknesses. Remarkably, this integral is actually a return to Riemann’s orig- 
inal technique for defining integration, with some small modifications in how 
we describe the “fineness” of the partitions. An introduction to the generalized 
Riemann integral is the topic of Section 8.1. 


Chapter 8 

Additional Topics 


The foundation in analysis provided by the first seven chapters is sufficient 
background for the exploration of some advanced and historically important 
topics. The writing in this chapter is similar to that in the concluding project 
sections of each individual chapter. Exercises are included within the exposition 
and are designed to make each section a narrative investigation into a significant 
achievement in the field of analysis. 


8.1 The Generalized Riemann Integral 


Chapter 7 concluded with Henri Lebesgue’s elegant result that a bounded func- 
tion is Riemann-integrable if and only if its points of discontinuity form a set 
of measure zero. To eliminate the dependence of integrability on continuity, 
Lebesgue proposed a new method of integration that has become the standard 
integral in mathematics. In the Epilogue to Chapter 7, we briefly outlined some 
of the strengths and weaknesses of the Lebesgue integral, concluding with a look 
back to the Fundamental Theorem of Calculus (Theorem 7.5.1). (Lebesgue’s 
measure-zero criterion is not a prerequisite for understanding the material in 
this section, but the discussion in Section 7.7 provides some useful context for 
what follows.) 

If F is a differentiable function on [a, 6], then in a perfect world we might 
hope to prove that 


(1) f F' = F(b) — F(a). 

J a 

Notice that although this is the conclusion of part (i) of Theorem 7.5.1, there 
we needed the additional requirement that F' be Riemann-integrable. To drive 
this point home, Section 7.6 concluded with an example of a function that has 
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a derivative that the Riemann integral cannot handle. The Lebesgue integral 
alluded to earlier is a significant improvement. It can integrate our example 
from Section 7.6, but ultimately it too suffers from the same setback. Not every 
derivative is integrable, no matter which integral is used. 

What follows is a short introduction to the generalized Riemann integral, dis- 
covered independently around 1960 by Jaroslav Kurzweil and Ralph Henstock. 
As mentioned in Section 7.7, this lesser-known modification of the Riemann 
integral can actually integrate a larger class of functions than Lebesgue’s ubiq- 
uitous integral and yields a surprisingly simple proof of equation (1) above with 
no additional hypotheses. 


The Riemann Integral as a Limit 


Let 


P %2 1 • • • i 


be a partition of [a, b\. A tagged partition is one where in addition to P we have 
chosen points Ck in each of the subintervals [xk-i,Xk\- This sets the stage for 
the concept of a Riemann sum. Given a function / : [a, b } R, and a tagged 
partition (P, {c/ c }^ =1 ), the Riemann sum generated by this partition is given by 


n 


R(f,P) = E f{Ck){x k - Xk-l) 


fc = 1 


Looking back at the definition of the upper sum 


n 


U (/, P) = y] M k (xk — Xk-i) where M k = swp{f(x) : x G [x k -i, x k ]}, 


k = 1 


and the lower sum 


n 


L(f,P) = y ^TOfcpfc - x k -i) where m k = inf{/(a:) : x € [x k ~i, x k ]}, 


k = 1 


it should be clear that 


L(f,P) < R(f,P) < U(f,P) 

for any bounded function /. In Definition 7.2.7, we characterized integr ability 
by insisting that the infimum of the upper sums equal the supremum of the 
lower sums. Any Riemann sum is going to fall between a particular upper and 
lower sum. If the upper and lower sums are converging to some common value, 
then the Riemann sums are also eventually close to this value as well. The next 
theorem shows that it is possible to characterize Riemann integrability in a way 
equivalent to Definition 7.2.7 using an e-J-type definition applied to Riemann 


sums. 
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%k— 1 1 X k 


satisfies 


less than 5. 


Let 5 >0. A partition P is 5 -fine if every subinterval 
Xk — Xk-i < S. In other words, every subinterval has width 


Theorem 8.1.2 (Limit Criterion for Riemann Integrability). A bounded 
function f : [a, b\ R is Riemann-integrable with 


b 

f = A 

if and only if, for every e > 0, there exists a S > 0 such that, for any tagged 
partition ( P,{ck }) that is 5-fine, it follows that 



\R(f,P)-A\<e 


Before attempting the proof, we should point out that, in some treatments, 
the criterion in Theorem 8.1.2 is actually taken as the definition of Riemann inte- 
grability. In fact, this is how Riemann originally defined the concept. The spirit 
of this theorem is close to what is taught in most introductory calculus courses. 
To approximate the area under the curve, Riemann sums are constructed. The 
hope is that as the partitions become finer, the corresponding approximations 
get closer to the value of the integral. The content of Theorem 8.1.2 is that 
if the function is integrable, then these approximations do indeed converge to 
the value of the integral, regardless of how the tags are chosen. Conversely, if 
the approximating Riemann sums for finer and finer partitions collect around 
some value A , then the function is integrable and integrates to A. 

Proof. (=>) For the forward direction, we begin with the assumption that / is 
integrable on [a, b\. Given an e > 0, we must produce a 5 > 0 such that if 

(P, {Cfe is any tagged partition that is (5-fine, then | R(f, P) J f | e. 
Because / is integrable, we know there exists a partition P e such that 


P(/,P e )-L(/,P e )<^ 


Let M > 0 be a bound on |/|, and let n be the number of subintervals of P e (so 
that P e really consists of n + 1 points in [a, b\). We will argue that choosing 


5 = e/9 nM 


has the desired property. 

Here is the idea. Let (P, {c/c}) be an arbitrary tagged partition of [a, b] that 
is (5-fine, and let P' = P U P e . The key is to establish the string of inequalities 

L(f , P’) ~ | < L(f, P) < U(f, P) < U(f, P’) + 

Exercise 8.1.1. (a) Explain why both the Riemann sum P(/, P) and f b f 

fall between L(/, P) and U (/, P). 
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(b) Explain why [/(/, P') — P(/, P') < e/3. 

By the previous exercise, if we can show U (/, P ) < u(f, n + e/3 (and 
similarly L(/, P') — e/3 < L(/, P)), then it will follow that 


R(f,P) 



< e 


and the proof will be done. Thus, we turn our attention toward estimating the 
distance between U (/, P) and U (/, P'). 


Exercise 8.1.2. Explain why [/(/, P) — £/(/, P 7 ) > 0. 


A typical term in either [/(/, P) or [/(/, P 7 ) has the form Mk(xk — Xk-i), 
where M \ is the supremum of / over [a^-i, £&]. A good number of these terms 
appear in both upper sums and so cancel out. 


Exercise 8.1.3. (a) In terms of n, what is the largest number of terms of the 

form Mk(xk — Xk-i) that could appear in one of [/(/, P) or [/(/, P') but 
not the other? 


(b) Finish the proof in this direction by arguing that 

U(f,P)-U(f,P')<e/3. 

(<=) For this direction, we assume that the e-S criterion in Theorem 8.1.2 
holds and argue that / is integrable. Integr ability, as we have defined it, depends 
on our ability to choose partitions for which the upper sums are close to the 
lower sums. We have remarked that given any partition P, it is always the case 
that 

L(f,P)<R(f,P)<U(f,P ) 

no matter which tags are chosen to compute P(/, P). 

Exercise 8.1.4. (a) Show that if / is continuous, then it is possible to pick 

tags {c/e}^ =1 so that 

R(f,P) = U(f,P ). 

Similarly, there are tags for which P(/, P) = P(/, P) as well. 

(b) If / is not continuous, it may not be possible to find tags for which 
P(/, P) = [/(/, P). Show, however, that given an arbitrary e > 0, it 
is possible to pick tags for P so that 

U(f, P) — R{f, P) < e. 

The analogous statement holds for lower sums. 


Exercise 8.1.5. Use the results of the previous exercise to finish the proof of 
Theorem 8.1.2. □ 
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Gauges and <5(:r)-fine Partitions 

The key to the generalized Riemann integral is to allow the S in Theorem 8.1.2 
to be a function of x. 


Definition 8.1.3. A function 5 : [a,b\ R is called a gauge on [a, b\ if S(x) > 0 
for all x G [a, b\. 


Definition 8.1.4. Given a particular gauge 5(x), a tagged partition (P, {cfc}JJ =1 ) 
is 5 {x) -fine if every subinterval [xk-i,Xk\ satisfies Xk — %k-i < 5(cfc). In other 
words, each subinterval [xk-i,Xk\ has width less than S(ck). 


It is important to see that if S(x) is a constant function, then Definition 8.1.4 
says precisely the same thing as Definition 8.1.1. In the case where S(x) is not a 
constant, Definition 8.1.4 describes a way of measuring the fineness of partitions 
that is quite different. 


Exercise 8.1.6. Consider the interval [0, 1]. 

(a) If S(x) = 1/9, find a <5 (a; ) -fine tagged partition of [0,1]. Does the choice 
of tags matter in this case? 

(b) Let 

N f 1/4 if x = 0 
~ { x/Z if 0 < x < 1. 

Construct a $(x)-fine tagged partition of [0,1]. 

The tinkering required in Exercise 8.1.6 (b) may cast doubt on whether 
an arbitrary gauge always admits a 5(#)-fine partition. However, it is not too 
difficult to show that this is indeed the case. 


Theorem 8.1.5. Given a gauge S(x) on an interval [a, b\, there exists a tagged 
partition (P, {cfc}JJ =1 ) that is S (x) -fine. 

Proof. Let Iq = [a, b\. It may be possible to find a tag such that the trivial 
partition P = {a, b} works. Specifically, if b — a < S(x) for some x G [a, 6], then 
we can set c\ equal to such an x and notice that (P, {ci}) is (5(x)-fine. If no 
such x exists, then bisect [a, b] into two equal halves. 


Exercise 8.1.7. Finish the proof of Theorem 8.1.5. 


□ 


Generalized Riemann Integrability 

Keeping in mind that Theorem 8.1.2 offers an equivalent way to define Riemann 
integrability, we now propose a new method for defining the value of the integral. 
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Definition 8.1.6. A function / on [a, b] has generalized Riemann integral A 
if, for every e > o, there exists a gauge S(x) on [a, b] such that for each tagged 
partition (P, {ck}]t =1 ) that is 5(x)-fine, it is true that 

\R(f,P)-A\ <e. 


In this case, we write A = f. 

Theorem 8.1.7. If a function has a generalized Riemann integral, then the 
value of the integral is unique. 

Proof. Assume that a function / has generalized Riemann integral A\ and that 
it also has generalized Riemann integral A 2 . We must prove A\ = A 2 . 

Exercise 8.1.8. Finish the argument. ^ 

The implications of Definition 8.1.6 on the resulting class of integrable func- 
tions are far reaching. This is somewhat surprising given that the criteria for 
integrability in Definition 8.1.6 and Theorem 8.1.2 differ in such a small way. 
One observation that should be immediately evident is the following. 

Exercise 8.1.9. Explain why every function that is Riemann-integrable with 
f^f = A must also have generalized Riemann integral A. 

The converse statement is not true, and that is the important point. One 
example that we have of a non- Riemann-integrable function is Dirichlet’s func- 
tion 

1 if x G Q 
0 if x ^ Q 


g(x) 


which has discontinuities at every point of R. 

Theorem 8.1.8. Dirichlet’s function g(pc) is generalized Riemann-integrable on 
[0, 1] with /q g = 0. 

Proof. Let e > 0. By Definition 8.1.6, we must construct a gauge S(x) on [0, 1] 
such that whenever (P, {c/c}^ =1 ) is a £(x)-fine tagged partition, it follows that 


n 

0 < ^2 g(ck)(x k - Xk-i) < e. 
fc= 1 

The gauge represents a restriction on the size of Axk = xj~ — %k-i hr the sense 
that Axk < d{ck). The Riemann sum consists of products of the form g(ck) A^. 
Thus, for irrational tags, there is nothing to worry about because g(ck) = 0 in 
this case. Our task is to make sure that any time a tag Ck is rational, it comes 
from a suitably thin subinterval. 

Let { 7 * 1 , 7 * 2 , 7 * 3 , . . .} be an enumeration of the countable set of rational num- 
bers contained in [0,1]. For each 7*&, set 8{rk) = e/2 fc+1 . For x irrational, set 
8(x) = 1. 
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Exercise 8.1.10. Show that if (P, (c/e}^ =1 ) is a S(x)- fine tagged partition, then 
R(g , P) < e. □ 

Dirichlet’s function fails to be Riemann-integrable because, given any (un- 
tagged) partition, it is possible to make R(g , P) = 1 or R{g , P) = 0 by choosing 
the tags to be either all rational or all irrational. For the generalized Rie- 
mann integral, choosing all rational tags results in a tagged partition that is 
not £(x)-fine (when S(x) is small on rational points) and so does not have to be 
considered. In general, allowing for nonconstant gauges allows us to be more 
discriminating about which tagged partitions qualify as 5(x)-fine. The result, 
as we have just seen, is that it may be easier to achieve the inequality 


\R(f,P)-A\<e 


for the often smaller and more carefully selected set of tagged partitions that 
remain. 


The Fundamental Theorem of Calculus 

We conclude this brief introduction to the generalized Riemann integral with a 
proof of the Fundamental Theorem of Calculus. As was alluded to earlier, the 
most notable distinction between the following theorem and part (i) of Theorem 
7.5.1 is that here we do not need to assume that the derivative function is inte- 
grate. Using the generalized Riemann integral, every derivative is integrable, 
and the integral can be evaluated using the antiderivative in the familiar way. 
It is also interesting to note that in Theorem 7.5.1 the Mean Value Theorem 
played the crucial role in the argument, but it is not needed here. 


Theorem 8.1.9. Assume F : [a, b] R is differentiable at each point in [a, b\ 
and set f(pc) = F'(x). Then , f has the generalized Riemann integral 


[ f = F{b)~ F{a). 

J a 


Proof. Let P = {xo, aq, aq, • • • , x n} be a partition of [a, b\. Both this proof and 
the proof of Theorem 7.5.1 make use of the following fact. 


Exercise 8.1.11. Show that 


F(b) - F(a ) = ^ [F(x k ) - F(x k m 

k = 1 
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If {c/c }^ =1 is a set of tags for P, then we can estimate the difference between 
the Riemann sum P(/, P) and F(b) — F(a) by 


| F(b)-F(a)-R(f,P) 


n 


E [F( x k) - F(x k - 1 ) - f(c k )(x k - x k -i) 


k = 1 
n 


< E I F(x k ) - F(x k - i) - f(ck)(xk ~ x k - i) 


k = 1 


Let e > 0. To prove the theorem, we must construct a gauge (5(c) such that 


( 2 ) 


\F(b) — F(a) — R(f,P)\ < e 


for all (P, {cfc}) that are (5(c)-fine. (Using the variable c in the gauge function 
is more convenient than x in this case.) 

Exercise 8.1.12. For each c E [a, 6], explain why there exists a (5(c) > 0 (a 
(5 > 0 depending on c) such that 


F(x) - F(c) 


x — c 


/(c) 


< e 


for all 0 < 


x — c 


< S(c). 


This (5(c) is the desired gauge on [a, b\. Let (P, {c/e}^ =1 ) be a <5(c)-fine parti- 
tion of [a, b\. It just remains to show that equation (2) is satisfied for this tagged 
partition. 

Exercise 8.1.13. (a) For a particular c& E [xk-i^Xk] of P, show that 

I F(x k ) ~ F(c k ) - f(c k )(x k - c fc ) | < e(x k - c k ) 


and 


\F(c k ) - F(x k -i) - /(cfc)(c fe - x k -x)\ < e(c k - x k -i) 


(b) Now, argue that 

I F(x k ) - F(x k _ i) - f(c k )(x k - x fc _i)| < e(x k - x fe _i), 
and use this fact to complete the proof of the theorem. 

If we consider the function 


F(x) 


_ ( x 3 / 2 sin(l/x) if x ^ 0 


0 


if x = 0 


□ 


then it is not too difficult to show that P is differentiable everywhere, including 
x = 0, with 


F'(x) = 


(3/2)yTsin(l/x) — (1 / y/x) cos(l/x) if x ^ 0 


0 


if x = 0. 
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What is notable here is that the derivative is unbounded near the origin. The 
theory of the ordinary Riemann integral begins with the assumption that we 
only consider bounded functions on closed intervals, but there is no such re- 
striction for the generalized Riemann integral. Theorem 8.1.9 proves that F' 
has a generalized integral. Now, improper Riemann integrals have been created 
to extend Riemann integration to some unbounded functions, but it is another 
interesting fact about the generalized Riemann integral that any function hav- 
ing an improper integral must already be integrable in the sense described in 
Definition 8.1.6. 

As a parting gesture, let’s show how Theorem 8.1.9 yields a short verification 
of the substitution technique from calculus. 

Theorem 8.1.10 (Change-of- variable Formula). Let g : [a, b] R be 

differentiable at each point of [a, b\, and assume F is differentiable on the set 
g([a,b\). If f(x) = F’ (x) for all x G g([a,b}), then 

nb ng{b ) 

(f°g)-g' = /• 

Ja J <7 (a) 


Proof. The hypothesis of the theorem guarantees that the function (F o g) (x) 
is differentiable for all x G [a, b\. 

Exercise 8.1.14. (a) Why are we sure that / and (F o g)' have generalized 

Riemann integrals? 


(b) Use Theorem 8.1.9 to finish the proof. 


□ 


The impressive properties of the generalized Riemann integral do not end 
here. The central source for the material in this section is Robert Bartle’s 
award winning article “Return to the Riemann Integral,” which appeared in the 
American Mathematical Monthly , October, 1996. The article goes on to discuss 
convergence theorems for this new integral in the spirit of Theorem 7.4.4, and 
outlines the argument that the collection of integrable functions is strictly larger 
when the Lebesgue integral is replaced by the generalized Riemann integral. In 
light of this, the author boldly declares that “the time has come to discard the 
Lebesgue integral as the primary integral .” (Italics in the original.) 

That this revolution has not come to pass may simply be due to a case of 
overwhelming inertia, but a contributing factor is very likely the geometrically 
satisfying intuition of Lebesgue’s theory. At the heart of Lebesgue’s approach to 
integration is the desire to generalize the concepts of length and area. Although 
one can certainly use a properly developed integral to give a rigorous definition 
for the length — or measure — of a general set, there is a compelling argument 
that this puts the ideas in the wrong pedagogical order. Rather than using a 
sophisticated integral to generalize a primitive notion such as length, Lebesgue 
found an effective way to talk about the length of a very wide class of sets, and 
used that to build his definition of the integral. The very elegant result of his 
endeavor is likely to be the industry standard for a long time to come. 
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8.2 Metric Spaces and the Baire Category 
Theorem 

A natural question to ask is whether the theorems we have proved about se- 
quences, series, and functions in R have analogues in the plane R 2 or in even 
higher dimensions. Looking back over the proofs, one crucial observation is 
that most of the arguments depend on just a few basic properties of the abso- 
lute value function. Interpreting the statement u \x — y\ v to mean the “distance 
from x to y in R,” our aim is to experiment with other ways of measuring dis- 
tance on other sets such as R 2 and C[ 0, 1], the space of continuous functions on 
[ 0 , 1 ], 

Definition 8.2.1. Given a set X, a function Llxl^Risa metric on X 
if for all x,y E X: 

(i) d(x, y) > 0 with d(x, y) = 0 if and only \{ x = y, 

(ii) d(x,y) = d(y,x), and 

(iii) for all z G X , d(A, y^j ^ d(x , z) -j- d(z, y ). 

A metric space is a set X together with a metric d. 

Property (iii) in the previous definition is the “triangle inequality.” The next 
two exercises illustrate the point that the same set X can be home to several 
different metrics. When referring to a metric space, we must specify the set and 
the particular distance function d. 

Exercise 8.2.1. Decide which of the following are metrics on X = R 2 . For 
each, we let x = (aq,^) and y = (^ 1 ,^/ 2 ) be points in the plane. 

(a) d(x,y) = 

(b) d(x,y) = m&x{\xi - yi\,\x 2 - V 2 \}- 

(c) d(x,y) = \xix 2 + yiy 2 • 

The metric in part (a) of the previous exercise is the familiar Euclidean 
distance between two points in the plane. This is often referred to as the “usual” 
or “standard” metric on R 2 . The usual metric on R is our old friend d(x,y) = 

x - y . 

Exercise 8.2.2. Let C[0, 1] be the collection of continuous functions on the 
closed interval [0, 1]. Decide which of the following are metrics on C[0, 1]. 

(a) d(f, g ) = sup{|/(a;) - g(x)\ : x € [0, 1]}. 

(b) d(f,g) = |/(1) - p(l)|. 

(c) d(f,g) = Jo \f~g ■ 



8.2. Metric Spaces and the Baire Category Theorem 


259 


The following distance function is called the discrete metric and can be 
defined on any set X. For any x,y G X, let 


p(x,y) 


1 if x 7 ^ y 
0 if x = y. 


Exercise 8.2.3. Verify that the discrete metric is actually a metric. 


Basic Definitions 

Definition 8.2.2. Let (X, d) be a metric space. A sequence (x n ) C X converges 
to an element x G X if for all e > 0 there exists an N G N such that d(x n ,x) < e 
whenever n > N. 

Definition 8.2.3. A sequence (x n ) in a metric space ( X , d) is a Cauchy sequence 
if for all e > 0 there exists an N G N such that d(x m ,x n ) < e whenever 
m, n > N . 

Exercise 8.2.4. Show that a convergent sequence is Cauchy. 

The Cauchy Criterion, as it is called in R, was an “if and only if” statement. 
In the general metric space setting, however, the converse statement does not 
always hold. Recall that, in R, the assertion that “Cauchy sequences converge” 
was shown to be equivalent to the Axiom of Completeness. In order to transport 
the Axiom of Completeness into a metric space, we would need to have an 
ordering on our space so that we could discuss such things as upper bounds. It 
is an interesting observation that not every set can be ordered in a satisfying 
way (the points in R 2 for example). Even without an ordering, we are still going 
to want completeness. For metric spaces, the convergence of Cauchy sequences 
is taken to be the definition of completeness. 

Definition 8.2.4. A metric space (A, d) is complete if every Cauchy sequence 
in X converges to an element of X. 

Exercise 8.2.5. (a) Consider R 2 with the discrete metric p(x,y) examined 

in Exercise 8.2.3. What do Cauchy sequences look like in this space? Is 
R 2 complete with respect to this metric? 

(b) Show that C[0, 1] is complete with respect to the metric in Exercise 
8 . 2.2 (a). 

(c) Define C 1 [0, 1] to be the collection of differentiable functions on [0,1] whose 
derivatives are also continuous. Is C 1 [0, 1] complete with respect to the 
metric defined in Exercise 8.2.2 (a)? 

Because completeness is a prerequisite for doing anything significant in the 
way of analysis, the metric in Exercise 8.2.2 (a) is the most natural metric to 
consider when working with C[ 0 , 1 ]. The notation 


1 / - 9 1 1 oo = d(f, g) = sup{|/(ar) - g{x)\ : x € [ 0 , 1 ]} 
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is standard, and setting g = 0 gives the so-called “sup norm” 

ll/lloo = d(f, 0) = sup{|/(x)| : x G [0,1]}. 

In all upcoming discussions, it is assumed that the space C[ 0, 1] is endowed with 
this metric unless otherwise specified. 

Definition 8.2.5. Let (X, d\) and (Y,^) be metric spaces. A function / : 
X Y is continuous at x E X if for all e > 0 there exists a <5 > 0 such that 
cfe (/(#), f(y)) < £ whenever d\(x,y) < 5. 

Exercise 8.2.6. Which of these functions from C[ 0, 1] to R (with the usual 
metric) are continuous? 

(a) g(f ) = Jq /fc, where k is some fixed function in C[ 0, 1]. 

(b) g(f) = /( 1/2). 

( c ) 9(f) = /(1/2), but this time with respect to the metric on C[0, 1] from 
Exercise 8.2.2 (c). 

Topology on Metric Spaces 

Definition 8.2.6. Given e 0 and an element x in the metric space ( X , d), 
the e-neighborhood of x is the set V e (x) = {y e X : d(x,y) < e}. 

Exercise 8.2.7. Describe the e-neighborhoods in R 2 for each of the different 
metrics described in Exercise 8.2.1. How about for the discrete metric? 

With the definition of an e- neighborhood, we can now define open sets , limit 
points , and closed sets exactly as we did before. A set O C X is open if for 
every x E O we can find a neighborhood V e (x) CO. A point x is a limit point 
of a set A if every V e (x) intersects A in some point other than x. A set C is 
closed if it contains its limit points. 

Exercise 8.2.8. Let (X,d) be a metric space. 

(a) Verify that a typical e-neighborhood V e (x) is an open set. Is the set 

Ce(x) = {y e X : d(x, y ) < e} 


a closed set? 

(b) Show that a set E C X is open if and only if its complement is closed. 

Exercise 8.2.9. (a) Show that the set Y = {/ E C[0, 1] : \\f\\oo A 1} is 

closed in C[0, 1]. 

(b) Is the set T = {/ E C[0, 1] : /( 0) = 0} open, closed, or neither in C[0, 1]? 
We define compactness in metric spaces just as we did for R. 
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Definition 8.2.7. A subset A" of a metric space (A, d) is compact if every 
sequence in K has a convergent subsequence that converges to a limit in K. 

An extremely useful characterization of compactness in R is the proposition 
that a set is compact if and only if it is closed and bounded. For abstract metric 
spaces, this proposition only holds in the forward direction. 

Exercise 8.2.10. (a) Supply a definition for bounded subsets of a metric 

space (A, d). 

(b) Show that if A is a compact subset of the metric space (A, d), then K is 
closed and bounded. 

(c) Show that Y C C[ 0, 1] from Exercise 8.2.9 (a) is closed and bounded but 
not compact. 

A good hint for part (c) of the previous exercise can be found in Exer- 
cise 6.2.14 from Chapter 6. This exercise defines the concept of an equicontin- 
uous family of functions, which is a key ingredient in the Arzela-Ascoli The- 
orem (Exercise 6.2.15). The Arzela-Ascoli Theorem states that any bounded, 
equicontinuous collection of functions in C[0, 1] must have a uniformly conver- 
gent subsequence. One way to summarize this famous result — which we did not 
have the language for in Chapter 6 — is as a statement describing a particular 
class of compact subsets in C[ 0, 1]. Looking at the definition of compactness, 
and remembering that the uniform limit of continuous functions is continuous, 
the Arzela-Ascoli Theorem states that any closed, bounded, equicontinuous 
collection of functions is a compact subset of C[ 0, 1]. 

Definition 8.2.8. Given a subset E of a metric space (A, d), the closure E is 
the union of E together with its limit points. The interior of E is denoted by 
E° and is defined as 

E° = {x G E : there exists V e (x) C E}. 

Closure and interior are dual concepts. Results about these concepts come 
in pairs and exhibit an elegant and useful symmetry. 

Exercise 8.2.11. (a) Show that E is closed if and only if E = E. Show that 

E is open if and only if E° = E. 

(b) Show that E° = (E c )°, and similarly that (E°) c = E c . 

A good hint for this exercise is to review the proofs from Chapter 3, where 
closure at least is discussed. Thinking of all of these concepts as they relate 
to R or R 2 with the usual metric is not a bad idea. However, it is important 
to remember also that rigorous proofs must be constructed purely from the 
relevant definitions. 

Exercise 8.2.12. (a) Show 

v e (x) C {y G X : d(x, y) < e}, 
in an arbitrary metric space (A, d). 
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(b) To keep things from sounding too familiar, find an example of a specific 
metric space where 


V e (x) / {t/ £ I : d(x,y) < e}. 

We are on our way toward the Baire Category Theorem. The next definitions 
provide the final bit of vocabulary needed to state the result. 

Definition 8.2.9. A set A C X is dense in the metric space (X, d) if A = X. 
A subset E of a metric space (X, d) is nowhere- dense in X if E° is empty. 

Exercise 8.2.13. If E is a subset of a metric space (X, d), show that E is 
nowhere-dense in X if and only if E is dense in X. 

The Baire Category Theorem 

In Section 3.5, we proved Baire’s Theorem, which states that it is impossible to 
write the real numbers R as the countable union of nowhere-dense sets. Previous 
to this, we knew that R was too big to be written as the countable union of single 
points (R is uncountable), but Baire’s Theorem improves on this by asserting 
that the only way to make R from a countable union of arbitrary sets is for 
the closure of at least one of these sets to contain an interval. The keystone 
to the proof of Baire’s Theorem is the completeness of R. The idea now is to 
replace R with an arbitrary complete metric space and prove the theorem in 
this more general setting. This leads to a statement that can be used to discuss 
the size and structure of other spaces such as R 2 and C[0, 1]. At the end of 
Chapter 3, we mentioned one particularly fascinating implication of this result 
for C[ 0, 1], which is that — despite the substantial difficulty required to produce 
an example of one — most continuous functions are nowhere-differentiable. It 
would be a good idea at this point to reread Sections 3.6 and 5.5. We are now 
equipped to carry out the details promised in these discussions. 

Theorem 8.2.10. Let (X, d) be a complete metric space , and let {O n } be a 
countable collection of dense, open subsets of X . Then, H^Li O n E n °t empty. 

Proof. When we proved this theorem on R, completeness manifested itself in 
the form of the Nested Interval Property. We could derive something akin 
to NIP in the metric space setting, but instead let’s take an approach that 
uses the convergence of Cauchy sequences (because this is how we have defined 
completeness). 

Pick x\ G 0\. Because 0\ is open, there exists an e\ > 0 such that 
v^OnjcOi. 

Exercise 8.2.14. (a) Give the details for why we know there exists a point 

x 2 £ V ei (x 1 )n0 2 and an e^ > 0 satisfying e 2 < ei/2 with V e2 (x 2 ) contained 
in O 2 and 


V C2 (x 2 ) C V £1 (xi). 
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(b) Proceed along this line and use the completeness of (X, d) to produce a 
single point x E O n for every n E N. □ 

Theorem 8.2.11 (Baire Category Theorem). A complete metric space is 
not the union of a countable collection of nowhere-dense sets. 

Exercise 8.2.15. Complete the proof of the theorem. 

This result is called the Baire Category Theorem because it creates two 
categories of size for subsets in a metric space. A set of “first category” is one 
that can be written as a countable union of nowhere-dense sets. These are the 
small, intuitively thin subsets of a metric space. We now see that if our metric 
space is complete, then it is necessarily of “second category,” meaning it cannot 
be written as a countable union of nowhere-dense sets. Given a subset A of a 
complete metric space X, showing that A is of first category is a mathematically 
precise way of demonstrating that A constitutes a very minor portion of the set 
X. The term “meager” is often used to mean a set of first category. 

With the stage set, we now outline the argument that continuous functions 
that are differentiable at even one point of [0,1] form a meager subset of the 
metric space C[ 0, 1]. 

Theorem 8.2.12. The set 

D = {feC[ ;0,1] :/'(*) exists for some x E [0, 1]} 

is a set of first category in C[ 0,1]. 

Proof. For each pair of natural numbers m, n, define 

A m , n = If e C[ 0, 1] : there e: 

f{x) ~ f(t) 

x — t 

This definition takes some time to digest. Think of 1/m as defining a 5- 
neighborhood around the point x, and view n as an upper bound on the mag- 
nitude of the slopes of lines through the two points (x, f(x)) and (t, f(t)). The 
set Am^ contains any function in C[ 0, 1] for which it is possible to find at 
least one point x where the slopes through (x, f(x)) and points on the function 
nearby — within 1/m to be precise — are bounded by n. 

Exercise 8.2.16. Show that if / E C[ 0, 1] is differentiable at a point x E [0, 1], 
then / E Am^ for some pair m, n E N. 

The collection of subsets {A mjn : m,n E N} is countable, and we have just 
seen that the union of these sets contains our set D. Because it is not difficult 
to see that a subset of a set of first category is first category, the final hurdle in 
the argument is to prove that each A m?n is nowhere-dense in C[ 0, 1]. 


:ists x E [0, 1] where 


1 1 

< n whenever 0 < x — t < — 

m 
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Fix m and n. The first order of business is to prove that A m , n is a closed 
set. To this end, let (f k ) be a sequence in A m ^ n and assume f k —>f hr C[ 0, 1]. 
We need to show / E A m ^ n . 

Because f k E A m>n , then for each k E N there exists a point x k G [0, 1] 
where 

~ /fcft) 

x k -t 

Exercise 8.2.17. (a) The sequence (x k ) does not necessarily converge, but 

explain why there exists a subsequence (x kl ) that is convergent. Let x = 

(b) Prove that f kl (x kl ) f(x). 

(c) Now finish the proof that A m?n is closed. 



< n for all 0 < x k — t < 1/m. 


Because A m ?n is closed, A 


m,n 


= A 


m,n ■ 


In order to prove that A m?n is 


nowhere- dense, we just have to show that it contains no e- neighborhoods, so 
pick an arbitrary / E A m?n , let e > 0, and consider the e-neighborhood V e (f) 
in C[ 0, 1]. To show that this set is not contained in A m?n , we must produce a 
function g E C[0, 1] that satisfies \\f — g\\oo < e an d has the property that there 
is no point x E [0, 1] where 


g(x) - g{t) 


X 


t 


< n for all 0 < \x — t\ < 1/m 


Exercise 8.2.18. A continuous function is called polygonal if its graph consists 
of a finite number of line segments. 

(a) Show that there exists a polygonal function p E C[ 0, 1] satisfying 
11/ -Plloo < e/2. 

(b) Show that if h is any function in C[0, 1] that is bounded by 1, then the 
function 

g(x) =p(x) + -h(x) 

satisfies g E V e (f). 

(c) Construct a polygonal function h(x) in C[0, 1] that is bounded by 1 and 

leads to the conclusion g £ A m>n , where g is defined as in (b). Explain 
how this completes the argument for Theorem 8.2.12. □ 


8.3 Euler’s Sum 

In Section 6.1 we saw Euler’s first and most famous derivation of the formula 


1111 

1+ 4 + 9 + 16 + 25 + 


7T 


6 
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At the crux of this argument are two representations for the function sin(x). 
The first is the standard Taylor series representation 


ry* ^ ry» ^ ry» 1 

(D sin(,)=,-- + --- + ..., 

and the second is an infinite product representation 



Although we have since made rigorous sense of the first equation (Example 6.6.4), 
proving the validity of equation (2) is still beyond our means. 

The news is not all bad, however. In the time since Euler first made this 
discovery, dozens of different proofs for this result have been published, start- 
ing with several by Euler himself and continuing right up to the present. The 
machinery required in these arguments runs the gamut from multi-variable cal- 
culus to Fourier series to complex integration, but one in particular due to Boo 
Rim Choe relies mainly on Taylor series expansions and properties of uniformly 
convergent series. Choe’s argument was published in 1987 but actually has much 
in common with one of Euler’s original attempts. The proof outlined in this 
section follows Choe’s argument with some simplifications due to Peter Duren. 


Wallis’s Product 


Even though we don’t currently have the tools to prove the infinite product 
formula for sin(x) in equation (2), we can prove a special case. 

Exercise 8.3.1. Supply the details to show that when x = 7r/2 the product 
formula in (2) is equivalent to 

( 3 ) * = lim (EE (E) ( EiE 

2 n^oo \1-3J \3-5J V5 • 7y \(2n- l)(2ra + !) 



where the infinite product in (2) is interpreted to be a limit of partial products. 
(Although it is not necessary for what follows, it might be useful to review the 
treatment of infinite products in Exercises 2.4.10 and 2.7.10.) 

The goal of the next few exercises is to supply a proper proof for equation (3). 
This curious formula involving i r was first discovered by John Wallis (1616-1703) 
and will provide some key ingredients for our proof of Euler’s sum. It resurfaces 
again in Section 8.4 where the factorial function is defined. 

Set 

7 r 

b n = sin n (x)dx, for n = 0, 1, 2, ... . 

Jo 

The first few terms are easy enough to calculate; in particular, 



— and b\ 
2 



sin (pc)dx = 1. 


1 [ 13 ], p. 92-95 
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Exercise 8.3.2. Assume h{x) and k(x) have continuous derivatives on [a, b] 
and derive the integration-by-parts formula 

>b nb 

h(t)k'(t)dt = h(b)k(b) — h(a)k(a) — / h' (t)k(t)dt . 


a 


a 


Exercise 8.3.3. (a) Using the simple identity sin n (x) = sin n 1 (x) sin(x) and 

the previous exercise, derive the recurrence relation 

n — 1 

b n = b n - 2 for all n > 2. 

n 


(b) Use this relation to generate the first three even terms and the first three 
odd terms of the sequence (b n ). 

(c) Write a general expression for 6 2 n and & 2n+ i. 

Because 0 < sin n+1 (x) < sin n (x) on [0, tt/2] , it follows that b n + 1 < b n and 
(b n ) is decreasing. It turns out that (b n ) 0 but that isn’t the limit we are 
interested in at the moment. 


Exercise 8.3.4. Show 


lim 


>2 n 


= 1 , 


n^oo 6 2n + 1 

and use this fact to finish the proof of Wallis’s product formula in (3). 

There are some standard techniques for working with the notation of equa- 
tion (3). For instance, 

2 • 4 • 6 • • • (2n) = 2 n n\ 


and 


1 • 3 • 5 • • • (2n + 1) = 


(2n + I)! 


2 • 4 • 6 • • • (2 n) 


(2n + I)! 
2 n n\ 


Exercise 8.3.5. Derive the following alternative form of Wallis’s product for- 
mula: 

2 2n (n!) 2 

7T = hill — — — . 

n^oo ( 2 npCn 


Taylor Series 

The next step in the argument is to generate the Taylor series for arcsin(x). This 
is not really possible to do directly from Taylor’s formula for the coefficients, 
but keeping in mind that 


(arcsin(x)) / 


1 





we can get where we want to go by first finding the expansion for 1 / y 7 ! — x. 
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Exercise 8.3.6. Show that 1/y/l — x has Taylor expansion c nX n , where 

c o = l and 

(2 n)! _ 1 • 3 • 5 • • • (2n — 1) 

Cn _ 2 2n (n!) 2 _ 2 • 4 • 6 • • • 2n 

for n > 1. 


The coefficients c n should look familiar from our work on Wallis’s product. 
Exercise 8.3.5 can be rephrased as 

= lim — . 

n— ^oo c n yn 

Exercise 8.3.7. Show that limc n = 0 but c n diverges. 

The divergence of ° n ma kes sense when we consider the Taylor series 

for 1 / a/I — x. We want to determine the values of x for which 



1 

y/1 — X 


E Cnxn > 

n=0 


and x = 1 is not in the domain of the left side. We do aim to prove (4) for all 
x G (—1, 1) but the usual word of warning is in order. Having computed the 
coefficients c n , it is not enough to simply argue that the series on the right side 
converges when \x\ < 1. To properly establish (4) we are going to show that 
the error function 


En(x) 


1 


>/T 


X 


N 

E 


C n x 


n 


n- 


: 0 


tends to zero as N oo. Back in Section 6.6, the primary tool we used for this 
task was Lagrange’s Remainder Theorem (Theorem 6.6.3), but it is not up to 
this particular challenge 


Exercise 8.3.8. Using the expression for Ejy(x) from Lagrange’s Remainder 
Theorem, show that equation (4) is valid for all \x\ < 1/2. What goes wrong 
when we try to use this method to prove (4) for x G (1/2, 1)? 


The Integral Form of the Remainder 

The moral of the previous exercise is that we need a different method for es- 
timating En(x). The Lagrange form of the remainder grows out of the Mean 
Value Theorems and yields a formula for the error function in terms of the 
derivative /( Ar+1 ). Now that we are in possession of a proper definition of the 
integral, we can derive another useful formula for En(x). 

Theorem 8.3.1 (Integral Remainder Theorem). Let f be differentiable 
N + 1 times on (— R, R ) and assume /( 7V+1 ) is continuous. Define a n = 
/( n )(0)/n! for n = 0, 1, . . . , N, and let 

Sn{x) = ao + a\x + a2X 2 + • • • + onx n . 
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For all x E (—R,R), the error function Ejy(x) = f(x) — Sn(x) satisfies 

E n (x) = TJ f (N+l) {t){x-t) N dt. 

Proof. The case x = 0 is easy to check, so let’s take x 7 ^ 0 in (— E, R) and keep 
in mind that x is a fixed constant in what follows. To avoid a few technical 
distractions, let’s just consider the case x > 0 . 

Exercise 8.3.9. (a) Show 


f(x) = /( 0 ) + [ f(t)dt . 

J 0 


(b) Now use a previous result from this section to show 

f(x) = /(0) + f(0)x + [ f"(t)(x — t)dt . 

J 0 


(c) Continue in this fashion to complete the proof of the theorem. □ 

To gain a better understanding of this formulation for Ej^(x) and simulta- 
neously make some headway on our exploration of equation (4), let’s return to 
the special case f(x) = l/\/l — x. 

Exercise 8.3.10. (a) Make a rough sketch of 1/y/l — x and over the 

interval (—1, 1), and compute for x = 1/2, 3/4, and 8/9. 

(b) For a general x satisfying — 1 < x < 1, show 

^ , 15 f x (x- 1\ 2 1 

v ; 16 J 0 \l-tj (1 -tfC 

(c) Explain why the inequality 

x — t 


is valid, and use this to find an overestimate for \E 2 (x)\ that no longer 
involves an integral. Note that this estimate will necessarily depend on x. 
Confirm that things are going well by checking that this overestimate is 
in fact larger than \E 2 (x)\ at the three computed values from part (a). 


(d) Finally, show E^{x) 0 as N 00 for an arbitrary x E ( — 1,1). 
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Having established that the Taylor series in (4) does indeed converge for 
ah \x\ < 1, it is now clear sailing to produce a Taylor series representation for 
arcsin(x). The first step is to substitute x 2 for x in (4) to get 


1 



oo 


E 

n — 0 


C n x 


2 n 


for all 


x 


< 1 


The next step is to take the term-by-term anti-derivative of this series. Any 
time we start manipulating infinite series as though they were finite in nature 
we need to pause and make sure we are on solid footing. 

Exercise 8.3.11. Assuming that the derivative of arcsin(x) is indeed l/y/l — x 2 , 
supply the justification that allows us to conclude 



oo 


arcsin(x) = 


c 


n 


n — 0 


2n + 1 


x 


2n+1 for all 


X 


< 1 . 


Exercise 8.3.12. Our work thus far shows that the Taylor series in (5) is valid 
for ah | x\ < 1, but note that arcsin(x) is continuous for all \x\ < 1. Carefully 
explain why the series in (5) converges uniformly to arcsin(x) on the closed 
interval [—1,1]. 


Summing Y^=\ l / 77 - 2 

Every proof of Euler’s sum contains a moment of genuine ingenuity at some 
point, and this is where our proof takes an unanticipated turn. 

Let’s make the substitution x = sin(0) in (5) where we restrict our attention 
to — 7r/2 < 6 < 7r/2. The result is 


oo 


0 = arcsin(sin(0)) = 


c 


n . 2n+l 


n= 0 


2n + 1 


sin 


( 0 ) 


which converges uniformly on [ — tt /2, tt/2] 

Exercise 8.3.13. (a) Show 


f' TC / 2 00 

/ ede = V' 
1 0 ^0 




2n + 1 


'2n+l 5 


being careful to justify each step in the argument. The term & 2 n+i refers 
back to our earlier work on Wallis’s product. 


(b) Deduce 


7 T 


oo 


E 


i 


8 ^,(2n + l) 2 ’ 

n — 0 v 7 


and use this to finish the proof that tt 2 / 6 = V n 
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The Riemann-Zeta Function 

Euler’s determination of the value of ]TM/n 2 brought him international recog- 
nition and represented a significant milestone in what would be a lifelong ex- 
ploration of series of the form ^ 1 /n s . Euler’s original argument for summing 
JM/n 2 discussed in Section 6.1 involved equating the coefficient of x 2 in two 
different series expansions for sm(x)/x. By equating the coefficients of higher 
powers of x he was also able to sum 1 /n s for s = 4, 6, 8, 10 and 12. (Try it 
for 8 = 4.) Eventually, Euler worked out a general formula for any even natural 
number, and in the process he shifted his focus to thinking about JM /n s as a 
function of the variable s. The iconic notation 


oo 


«»> = £ 


n = 1 


1 


TV 


for all s > 1, 


and the name — the Riemann-zeta function — would come one hundred years 
later, but it was Euler who first unearthed many deep properties of this func- 
tion. Significant among these is a connection to the prime numbers, evident in 
the Eulerian formula 



where the product is taken over all the primes. The mathematics underlying 
the Riemann-zeta function gets complicated very quickly, but this particular 
formula is actually quite accessible. Notice that for each prime p, 


1 


1 — p 


— s 


= 1 + 


1 


+ 


1 


p° p 


2s 


+ 


1 


p 


3s 


+ 


1 


p 


,4 s 


+ 


Multiplying out the product on the right in (6) in this fashion and using the 
fact that every n G N is a unique product of primes leads naturally to the given 
relationship. 

Euler returned to study £(s) many times in his career, in part it seems to 
tend to the unfinished business of evaluating /n s for the odd integers. Amid 
his many successes, this was a challenge that eluded Euler, as it has eluded every 
mathematician since. 


8.4 Inventing the Factorial Function 

The goal of this section is to produce a function f(x), defined on all of R, 
with the property that f(n) = nl for each n G N. With no other restriction on 
/, this is as easy as it is uninteresting — simply define / piecewise in such a way 
that it passes through the points (1,1), (2,2), (3, 6), (4, 24), and so on. Letting 

f( x ) = 


n\ if n < x < n + 1, n G N 

1 if x < 1 
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does the trick. 

To make this problem meaningful we need to be much more discriminating 
about what properties we require / to have. Should / be continuous? Differ- 
entiable? Twice differentiable? We shall see about this. This problem actually 
has its origins in a series of 1729 letters between Christian Goldbach (of “Gold- 
bach’s Conjecture” fame, although that is a different story) and Leonard Euler. 
The term “function” in Euler’s day implicitly referred to a mapping defined by 
an analytic expression comprised of the elementary functions and operations 
of calculus. Logarithms, exponentials, polynomials, and power series were ex- 
amples of 18th century functions; the piecewise concoction proposed above was 
not. 

Thus, a better statement of our goal — although still a little imprecise — is to 
find a function defined by a single, organic formula which extends the definition 
of n\ in a meaningful way to non-natural numbers. 

Exercise 8.4.1. For n E N, let 

n# = n + (n — 1) + (n — 2) + • • • + 2 + 1. 

(a) Without looking ahead, decide if there is a natural way to define 0#. How 
about (—2)#? Conjecture a reasonable value for 

(b) Now prove n# = ^n(n + 1) for all n E N, and revisit part (a). 

The formula in part (b) of the previous exercise not only simplifies the calcu- 
lation of n# for large values of n, but also yields a properly defined function on 
R when the discrete variable n is replaced with the continuous variable x. In- 
deed, Euler would be perfectly comfortable with the expression x # = ^x(x + 1). 

We are seeking something similar for n\. What is the right definition for x\ 
when x E R? 

The Exponential Function 

The idea of extending the definition of a function defined on N to all of R may 
at first sound like a somewhat whimsical enterprise, but it is perfectly analogous 
to the way we come to understand a function like 2 X . Similar to n!, 2 n for n E N 
is unambiguous and meaningful the minute we understand multiplication, but 
something like 2 _7T is another matter. Because it is instructive, and because 
we are going to presently need functions of the form t x , let’s take a moment to 
define exponential functions in a rigorous way. 

Typically the way a function like 2 X gets defined on R is through a series of 
domain expansions. Starting with 2 n , we first expand the domain to Z using 
reciprocals, then to Q using roots, and finally to R using continuity. Although 
we could follow this strategy, we are going to take a different approach that has 
the advantage of yielding the important properties we need more efficiently. 

Step one is to properly define the natural exponential function e x . Back 
in Chapter 6, we assumed e x was already defined and showed how it could be 
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represented by its Taylor series. Here we flip this process around. The problem 
on the table is to rigorously construct a proper definition for e x , and the theory 
of power series gives us a bedrock foundation on which to build. 

Define 



oo 


n 


2 2 

x jl X X 

E(x) = ) — r = 1 + a: + — + — + 

^ n\ 

n — 0 


2! 3! 


Exercise 8.4.2. Verify that the series converges absolutely for all x E R, that 
E{pc) is differentiable on R, and E'(x ) = E(x). 

Exercise 8.4.3. (a) Use the results of Exercise 2.8.7 and the binomial for- 

mula to show that E(pc + y) = E(x)E(y) for all x,y R. 

(b) Show that E( 0) = 1, E(—x) = 1 /E(x), and E(x) > 0 for all x G R. 

The takeaway here is that the power series E{x) satisfies all the properties 
we associate with the exponential function, and we can therefore give ourselves 
permission to go back to the more familiar notation e x in place of E(x). What 
happens if we have a momentary relapse and interpret e x as the real number 
e ~ 2.71828 . . . raised to the power x rather than E{x)l Not to worry — the two 
interpretations coincide, whenever the former is defined in the usual way. 

Exercise 8.4.4. Define e = E( 1). Show E(n) = e n and E ( m/n ) = ( t/e) 171 for 
all m, n E Z. 


One final property of e x we need is its behavior as x Too. 


Definition 8.4.1. Given / : [a, oo] R, we say that lim^^oo f(x) = L if, 
for all e > 0, there exists M > a such that whenever x > M it follows that 
\f(x) - L\ < e. 

Exercise 8.4.5. Show linx^oo x n e~ x = 0 for all n = 0, 1, 2, ... . 

To get started notice that when x > 0, all the terms in (1) are positive. 


Other Bases 

Having set e x on solid mathematical footing, we can now do the same for t x 
where t > 0. This requires use of the natural logarithm. 

Exercise 8.4.6. (a) Explain why we know e x has an inverse function — let’s 

call it logx — defined on the strictly positive real numbers and satisfying 

(i) log(e y ) = y for all y e R and 

(ii) e logx = x , for all x > 0. 

(b) Prove (logx)' = 1/x. (See Exercise 5.2.12.) 

(c) Fix y T 0 and differentiate log (xy^J with respect to x. Conclude that 

log (xy) = logx + log y for all x, y > 0. 
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(d) For t 0 and n G ]N, t n has the usual interpretation as t • t • • • t (n times). 
Show that 

(2) t n = e nlogt for all n G N. 

Part (d) of the previous exercise is the pivotal formula because the expression 
on the right of the equal sign is meaningful if we replace n with x G R. This 
is our cue to use the identity in (2) as a template for the definition of t x on all 
of R. 

Definition 8.4.2. Given t > 0, define the exponential function t x to be 

t x = e x log 1 for all x G R. 

Exercise 8.4.7. (a) Show = ( yft) 771 for all m,n G N. 

(b) Show log (t x ) = xlogt, for all t > 0 and x G R. 

(c) Show t x is differentiable on R and find the derivative. 

Finding the right definition for x\ is harder than defining t x , but the strategy 
is essentially the same. We are seeking a formula of the form nl = g(n) where 
g yields a meaningful formula when n is replaced by x. What might such a 
function g(x) = x\ look like when graphed over R? For x > 0 it must grow 
extremely rapidly to keep up with nl, but how about on x < 0? Using a 
functional equation for x\ we can create a reasonable artist’s rendering of the 
function we are looking for. 


The Functional Equation 

A defining property of the factorial on N is that 1! = 1 and nl = n(n— 1)! for all 
n > 2. Thus it seems reasonable to require the same from our currently mythic 
function x\ defined on R. Whatever x\ means it should satisfy 


xl = x(x — 1)! for all x G R. 

Setting n = 1 in this equation, for example, yields 1 = 0!. 

Exercise 8.4.8. Inspired by the fact that 0! = 1 and 1! = 1, let h(x) satisfy 

(i) h(x) = 1 for all 0 < x < 1, and 

(ii) h(x) = xh(x — 1) for all x G R. 

(a) Find a formula for h(x) on [1,2], [2,3], and [n, n + 1] for arbitrary 

n G N. 

(b) Now do the same for [—1, 0], [—2, —1], and [— n, — n + 1]. 

(c) Sketch h over the domain [—4,4]. 
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Notice that h{pc) satisfies h(ri) = nl and it is at least continuous for x > 0, 
but its piecewise definition and its many non-differentiable corners disqualify it 
from being our sought after factorial function. One legitimate conclusion that 
arises out of this exercise is that x!, when we find it, will exhibit the same 
asymptotic behavior as h at x = — 1, —2, —3, . . . , and thus won’t be defined on 
the negative integers. 


Improper Riemann Integrals 

For reasons that will become clear, we need to make rigorous sense of an ex- 
pression like 

oo 

e~ l dt. 

Most likely familiar from calculus, integrals over unbounded regions like [0, oo) 
are called improper Riemann integrals and are defined by taking the limit of 
“proper” integrals. 



Definition 8.4.3. Assume / is defined on [a, oo) and integrable on every inter- 
val of the form [a, b]. Then define f°° f to be 


lim 

6—^oo 



/ 




provided the limit exists. In this case we say the improper integral J a °° / con- 
verges. 


Exercise 8.4.9. (a) Show that the improper integral J a °° / converges if and 

only if, for all e > 0 there exists M > a such that whenever d > c > M it 
follows that 



< e. 


(In one direction it will be useful to consider the sequence a n = J^ +n /.) 

(b) Show that if 0 < / < g and J a °° g converges than f converges. 

(c) Part (a) is a Cauchy criterion, and part (b) is a comparison test. State 
and prove an absolute convergence test for improper integrals. 


Exercise 8.4.10. (a) Use the properties of e t previously discussed to show 



(b) Show 



1 

a 



e 


— at 



for all a > 0. 
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Just for a moment, let’s take our analysis gloves off and ask what we think 
might happen if we differentiate formula (3) with respect to a. 

On the left-hand side we certainly get 

-fl'_ 1 

a a 2 


On the right-hand side of (3), let’s brazenly crash through the integral sign and 
take the derivative of the integrand e~ at with respect to a (thinking of t as a 
constant.) The result is 


e 


—at 


/ 




The question, then, is whether this is a valid manipulation. Is it true that 





te~ at dt ? 


Well, let’s compute the integral and find out. 

Exercise 8.4.11. (a) Evaluate J Q 6 te~ at dt using the integration- by-parts for- 

mula from Exercise 7.5.6. The result will be an expression in a and b. 

(b) Now compute J 0 °° te~ at dt and verify equation (4). 

Apparently, our bold differentiation of equation (3) into equation (4) worked 
out. Now it’s time to put our analysis gloves back on and see why this is so. 


Differentiating Under the Integral 

Let f(x,t) be a function of two variables, defined for all a < x < b and c < t < d. 
The domain of / is then a rectangle D in R 2 . 

What does it mean to say / is continuous at a point (xo, to) hr D1 Section 8.2 
on metric spaces gives a more thorough explanation, but the only real difference 
from the single variable setting is that we have to replace our sense of distance 
between points (#o,to) an d (#,£) with the familiar Euclidean distance formula 

| (x,t) - (x 0 ,io)|| = VF~ Fr + (t- toC- 


Definition 8.4.4. A function f : D R is continuous at (#o,to) ^ ^ or 
e > 0, there exists S > 0 such that whenever || (x,t) — (xoAo)|| < it follows 
that 

I f(x,t) - f(x 0 ,t 0 ) | < e. 


Exercise 8.4.12. Assume the function f(x,t) is continuous on the rectangle 
D = {(x,t) : a < x < b, c < t < d}. Explain why the function 


nd 

F(x) = / f{pc,t)dt 


is properly defined for all x E [a, b\. 
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It should not be too surprising that Theorem 4.4.7 has an analogue in the 
R 2 setting. The set D is compact in R 2 , and a continuous function on D is 
uniformly continuous in the sense that the S in Definition 8.4.4 can be chosen 
independently of the point (xo,to). 

Theorem 8.4.5. If f{x,t) is continuous on D, then F{x) = L fix, t)dt is 
uniformly continuous on [a, b\. 

Exercise 8.4.13. Prove Theorem 8.4.5. 


Taking inspiration from equations (3) and (4), let’s add the assumption that 
for each fixed value of t in [c, d], the function f(x,t) is a differentiable function 
of 00 ^ that is, 


fx(x,t ) = lim 

Z^rX 


fjz,t) ~ /M 

Z ~ X 


exists for all (x,t) G D. In addition, let’s assume that the derivative function 
f x (x,t) is continuous. 


Theorem 8.4.6. If f(x,t) and f x (x,t) are continuous on D, then the function 
F{x) = fc fix, t)dt is differentiable and 


nd 

F'(x)= / f x (x, t)dt. 


Proof. Fix x in [a, b] and let e > 0 be arbitrary. Our task is to find a 6 > 0 such 
that 



F(z) - F(x ) 

z — X 


>d 


f x (x,t)dt 


< e 


whenever 0 < 


z — x 


< 5 . 


Exercise 8.4.14. Finish the proof of Theorem 8.4.6 


□ 


Improper Integrals, Revisited 

Theorem 8.4.6 is a formal justification for differentiating under the integral sign, 
but we need to extend this result to the case where the integral is improper. 
Looking back one more time to our motivating example in equation (3), we see 
that what we have is a function f(x,t) where the domain of the variable t is the 
unbounded interval c < t < oo. 

Let’s fix x from some set 4CR. For such an x, we define 


(6) 


nOO nd 

F(x) = / f(x,t)dt= lim / f{x,t)dt, 

Jc d^oo J c 


provided the limit exists. 
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Notice that the formula in (6) is a pointwise statement. Given an x E A and 
e > 0, we can find an M (perhaps dependent on x) where 


>d 


F(x) 


/(x, t)dt 


< e 


whenever d > M . As we have seen on numerous occasions, the elixir required 
to ensure that good behavior in the finite setting extends to the infinite setting 
is uniformity. 

Definition 8.4.7. Given /(x,£) defined onfi = {(x,£) : x E A,c < t}, assume 
F(x) = L fV, t)dt exists for all x E A. We say the improper integral converges 
uniformly to F(x) on A if for all e > 0, there exists M > c such that 


F(x) 


>d 


/(x, t)dt 


< e 


for all d > M and all x E A. 

Exercise 8.4.15. (a) Show that the improper integral J 0 °° e~ xt dt converges 

uniformly to 1/x on the set [1/2, oo). 


(b) Is the convergence uniform on (0, oo)? 


Exercise 8.4.16. Prove the following analogue of the Weierstrass M-Test for 
improper integrals: If /(x, t ) satisfies |/(x, t)\ < g(t) for all x G A and J a °° g(t)dt 
converges, then J a °° f(x,t)dt converges uniformly on A. 

An immediate consequence of Definition 8.4.7 is that if the improper integral 
converges uniformly then the sequence of functions defined by 




/(#, t)dt 


converges uniformly to F(x) on [a, b\. This observation gives us access to the 
host of useful results we developed in Chapter 6. 


Theorem 8.4.8. 

then 


If f(x,t) is continuous on D = {{x,t) : a < x < b,c < t], 


F(x) 



/(x, t)dt 


is uniformly continuous on [a,b\, provided the integral converges uniformly. 


Exercise 8.4.17. Prove Theorem 8.4.8. 


Theorem 8.4.9. Assume the function f(x,t) is continuous on D = {(x,t) : 
a < x < b,c < t} and F(x) = J c °° f(x,t)dt exists for each x E [a, b\. If the 
derivative function f x (x,t) exists and is continuous , then 

pOO 

(7) F'(x)= J f x (x,t)dt, 

provided the integral in (7) converges uniformly. 


Exercise 8.4.18. Prove Theorem 8.4.9. 
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The Factorial Function 

It’s time to return our attention to equation (3) from earlier in this section: 


1 

a 



for all a > 0. 


Exercise 8.4.19. (a) Although we verified it directly, show how to use the 

theorems in this section to give a second justification for the formula 


1 


cc 


*oo 


te at dt , for all a > 0. 


(b) Now derive the formula 




a n+1 



t n e~ at dt, 


for all a > 0. 


If we set a = 1 in equation (8) we get 


oo 

Ce^dt. 

The appearance of n\ on the left side of this equation is an exciting development, 
especially because where n appears on the right it can be meaningfully replaced 
by a real variable x, at least when x > 0. This is the equation we have been 
looking for! 

Definition 8.4.10. For x > 0, define the factorial function 

oo 

t x e- l dt. 




Exercise 8.4.20. (a) Show that x\ is an infinitely differentiable function on 

(0, oo) and produce a formula for the n th derivative. In particular show 
that ( x\ )" > 0. 

(b) Use the integration- by-parts formula employed earlier to show that x\ 
satisfies the functional equation 


(x + 1)! = (x + T)x \ . 

The previous exercise is our first piece of evidence that we have found the 
right definition for x\. There is more to come. 

A consequence of ( x\ )" > 0 is that x\ is a convex function. In calculus this is 
usually referred to as “concave up” and means that the line segment connecting 
two points on the graph of x\ always sits above the curve. Said another way, 
there are no inflection points in x\ and the slope of the curve steadily increases 
as the graph passes through the points (n,n!) for n = 0, 1,2, . . .. We did not 


8.4. Inventing the Factorial Function 


279 



a 


a 


b 


b' 


Figure 8.1: INCREASING CHORD SLOPES ON A CONVEX FUNCTION. 


mention this property at the time, but reflecting on our earlier analogy between 
2 X and x!, convexity is a natural condition to desire in our factorial function. 

In fact, not only is x\ convex but log(x!) is also convex. This is a stronger 
statement. (Consider, for instance, the graphs of x 2 + 1 and log(x 2 + 1).) The 
proof is a little technical and we won’t go through it, but the fact that log(x!) 
is convex on x > 0 is quite significant. Here’s why. 

Theorem 8.4.11 (Bohr— Mollerup Theorem). There is a unique positive 
function f defined on x > 0 satisfying 


(i) m = i 

(ii) f(x + 1) = (x + 1 )f(x), and 

(iii) log (f(x)) is convex. 

Because x\ satisfies properties (i), (ii), and (iii), it follows that f(x) = x\. 


Proof. We need one more geometrically plausible fact about convex functions. 
If [a, b] and [a',b'} are two intervals in the domain of a convex function </>, and 
a < a' and b < b ' , then the slopes of the chords over these intervals satisfy 


<p{b) - <p(a) < <p(b r ) - (j)(a') 
b — a ~ 1/ — o' 


(See Figure 8.1). 

Because / satisfies properties (i) and (ii) we know f(n) = n\ for all n G N. 
Now fix n G N and x G (0, 1]. 


Exercise 8.4.21. (a) Use the convexity of log (f(x)) and the three intervals 

n — 1, n], [n, n + x], and [n, n + 1] to show 


x log(n) < log (f(n + x)) — log(n!) < x log(n + 1). 


(b) Show log (f(n + x)) = log (f(x)) + log((x + l)(x + 2) • • • (x + n)). 
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(c) Now establish that 


0 < log (f(x)) - log 


n x n\ 


(x + l)(x + 2) • • • (x + n) 


1 

< x log(l H — ). 

n 


(d) Conclude that 



lim 

n— )>oo 


n x n\ 

(x + l)(x + 2) • • • (x + n) ’ 


for ah x G (0, 1]. 


(e) Finally, show that the conclusion in (d) holds for ah x > 0. 

Because we have arrived at an explicit formula for /(x), the function f(x) 
must be unique. By virtue of the fact that x\ satisfies conditions (i), (ii), and (iii) 
of the theorem, we can conclude that x\ is this unique function; i.e. , f(x) = x\. 
Thus, not only have we proved the theorem, but we have also discovered an alter- 
nate representation for the factorial function called the Gauss product formula : 



•oo 




I — 


t x e 1 dt = lim 


n x n\ 


7WOO (x + l)(x + 2) • • • (x + n) ’ 


for ah x > 0. □ 

What happens if x < 0? The integral in Definition 8.4.10 becomes improper 
for a second reason when x < 0 because t x is unbounded and undefined at t = 0. 
If —1 < x < 0, it is not hard to show that the integral still converges. On the 
other hand, the functional equation in Exercise 8.4.20(b) provides a natural way 
to extend the definition of x\ to all of R. Just as in Exercise 8.4.8, the resulting 
function is never zero, alternating between positive and negative components 
with vertical asymptotes at x = — 1, —2, —3, .... 

The Gamma Function 

The focus of our discussion has been on the ingredients that go into the def- 
inition of x \ — improper integrals, proper definitions of exponential functions, 
differentiating under the integral sign — but the end result is a function worthy 
of its own separate chapter. Since its discovery by Euler, the factorial function 
has become ubiquitous in numerous branches of analysis. 

One of the early modifications that occurred was a shift in the domain of 
x\ and a change in the notation. Adrien Marie Legendre introduced the Greek 
letter T (gamma) and set 

pOO 

T(x) = (x — 1)! = / t x ~ 1 e~ t dt , 

Jo 

so that T (ri T 1) = n\ and xT(x) = T(x + 1). This convention eventually became 
the standard, and so it is the gamma function that routinely appears in formulas 
from number theory, probability, geometry, and beyond. 
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Philip Davis’s article on the history of the gamma function (see [11]) is an 
excellent place to get a sense of the important role the gamma function has 
played in the development of analysis/ Davis’s essay seems to be at least part 
of the inspiration for a wonderful series of articles by David Fowler that explore 
the properties of xl in an original and accessible way. Here is one of the 
anecdotes Fowler offers, which serves as an enticing clue for how intricately the 
gamma/factorial function is connected to the larger mathematical landscape. 

Recall that when xl is extended to all of R via the functional equation 
x\ = x(x — 1)! we get asymptotes at every negative integer. Thus, there is a 
compelling reason to consider the reciprocal function l/x\ which we can take to 
be zero for x = —1, —2, —3, .... 

Exercise 8.4.22. (a) Where does g{pc) = X \^L X ^ equal zero? What other 

familiar function has the same set of roots? 

2 

The function e~ x provides the raw material for the all-important Gaus- 
sian bell curve from probability, where it is known that f_ e~ x dx = 
yJF. Use this fact (and some standard integration techniques) to evaluate 
( 1 / 2 )!. 

(c) Now use (a) and (b) to conjecture a striking relationship between the 
factorial function and a well-known function from trigonometry. 

Exercise 8.4.23. As a parting shot, use the value for (1/2)! and the Gauss 
product formula in equation (9) to derive the famous product formula for i r 
discovered by John Wallis in the 1650s: 

1 = lim ( 2 3 rl\ (FT) (FT) . . . ( *».2n 

2 n— >oo y 1 • 3 y f 3 • 5 J y5 • 7 J \(2n - l)(2n + 1) 

8.5 Fourier Series 

In his famous treatise, Theorie Analytique de la Chaleur (The Analytical The- 
ory of Heat), 1822, Joseph Fourier (1768-1830) boldly asserts, “Thus there is 
no function /(#), or part of a function, which cannot be expressed by a trigono- 
metric series.” 4 

It is difficult to exaggerate the mathematical richness of this idea. It has been 
convincingly argued by mathematical historians that the ensuing investigation 
into the validity of Fourier’s conjecture was the fundamental catalyst for the 
pursuit of rigor that characterizes 19th century mathematics. Power series had 
been in wide use in the 150 years leading up to Fourier’s work, largely because 
they behaved so well under the operations of calculus. A function expressed 
as a power series is continuous, differentiable an infinite number of times, and 

2 Exercise 8.4.1, as well as the insight of comparing the development of x\ to 2 X , are 
borrowed from this piece. 

3 Exercise 8.4.8 is borrowed from Fowler’s treatment in [15]. 

4 Quoted passages in this section are taken from [9]. 
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can be integrated and differentiated as though it were a polynomial. In the 
presence of such agreeable behavior, there was no compelling reason for mathe- 
maticians to formulate a more precise understanding of “limit” or “convergence” 
because there were no arguments to resolve. Fourier’s successful implementation 
of trigonometric series to the study of heat flow changed all of this. To under- 
stand what the fuss was really about, we need to look more closely at what 
Fourier was asserting, focusing individually on the terms “function,” “express,” 
and “trigonometric series.” 


Trigonometric Series 

The basic principle behind any series representations is to express a given func- 
tion f(x) as a sum of simpler functions. For power series, the component func- 
tions are {l,x,x 2 ,x 3 , . . .}, so that the series takes the form 


oo 


f( x ) = E 


a n X n = < 2(3 + a\X + CL2X 2 + CL3X 3 + 


n — 0 


A trigonometric series is a very different type of infinite series where the func- 
tions 

{1, cos(x), sin(x), cos(2x), sin(2x), cos(3x), sin(3x), . . .} 
serve as the components. Thus, a trigonometric series has the form 

f(x) = clq + a\ cos(x) + bi sin(x) + a 2 cos(2x) + 62 sin(2x) + as cos(3x) + • • • 

00 

= no T E a n cos {nx) + b n sin (nx). 

n — 1 

The idea of representing a function in this way was not completely new when 
Fourier first publicly proposed it in 1807. About 50 years earlier, Jean Le Rond 
d’Alembert (1717-1783) published the partial differential equation 

d 2 u d 2 u 
dx 2 dt 2 

as a means of describing the motion of a vibrating string. In this model, the 
function u(x, t) represents the displacement of the string at time t > 0 and at 
some point x, which we will take to be in the interval [0, 7 r]. Because the string 
is understood to be attached at each end of this interval, we have 


( 2 ) 


'u(0,£) = 0 and 'u(7T,t) = 0 


for all values of t > 0. Now, at t = 0, the string is displaced some initial amount, 
and at the moment it is released we assume 

du 


(3) 


dt 


(x,0) = 0, 


meaning that, although the string immediately starts to move, it is given no 
initial velocity at any point. Finding a function u(x,t) that satisfies equa- 
tions (1), (2), and (3) is not too difficult. 
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Exercise 8.5.1. (a) Verify that 

u(x, t) = b n sin(nrr) cos (nt) 

satisfies equations (1), (2), and (3) for any choice of n E N and b n E R . 
What goes wrong if n ^ N? 

(b) Explain why any finite sum of functions of the form given in part (a) 
would also satisfy (1), (2), and (3). (Incidentally, it is possible to hear 
the different solutions in (a) for values of n up to 4 or 5 by isolating the 
harmonics on a well-made stringed instrument.) 

Now, we come to the truly interesting issue. We have just seen that any 
function of the form 


N 

(4) u(x, t) = b n sin (nx) cos (nt) 

n — 1 

solves d’Alembert’s wave equation , as it is called, but the particular solution we 
want depends on how the string is originally “plucked.” At time t — 0, we will 
assume that the string is given some initial displacement f(x) = u(x , 0). Setting 
t = 0 in our family of solutions in (4), the hope is that the initial displacement 
function f(x) can be expressed as 


N 

(5) f(x) = Fj bn sin ( na 0- 

n — 1 

What this means is that if there exist suitable coefficients &i, • • • > & /v so that 

f(x) can be written as a sum of sine functions as in (5), then the vibrating-string 
problem is completely solved by the function u(x,t) given in (4). The obvious 
question to ask, then, is just what types of functions can be constructed as 
linear combinations of the functions {sin(x), sin(2x), sin(3x), . . .}. How general 
can f(x) be? Daniel Bernoulli (1700-1782) is usually credited with proposing 
the idea that by taking an infinite sum in equation (5), it may be possible to 
represent any initial position f(x) over the interval [0, tt] . 

Fourier was studying the propagation of heat when trigonometric series 
resurfaced in his work in a very similar way. For Fourier, f(x) represented 
an initial temperature applied to the boundary of some heat-conducting mate- 
rial. The differential equations describing heat flow are slightly different from 
d’Alembert’s wave equation, but they still involve the second derivatives that 
make expressing f(x) as a sum of trigonometric functions the crucial step in 
finding a solution. 

Periodic Functions 

In the early stages of his work, Fourier focused his attention on even functions 
(i.e., functions satisfying f(x) = f(—x)) and sought out ways to represent them 
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as series of the form ^ a n cos(nx) . Eventually, he arrived at the more general 
formulation of the problem, which is to find suitable coefficients (a n ) and (b n ) 
to express a function f(x) as 


oo 


(6) 


f(x) = a 0 + E a n cos (nx) + b n sin (nx) 


n — 1 


As we begin to explore how arbitrary f(x) can be, it is important to notice 
that every component of the series in equation (6) is periodic with period 2tt. 
Turning our attention to the term “function,” it now follows that any function 
we hope to represent by a trigonometric series will necessarily be periodic as 
well. We will give primary attention to the interval ( — tt, tt] . What this means 
is that, given a function such as f(x) = r 2 , we will restrict our attention to / 
over the domain (—7 r, tt] and then extend / periodically to all of R via the rule 
f(x) = f(x + 2/c7t) for all k E Z (Fig. 8.2). 

This convention of focusing on just the part of f(x) over the interval (— 7r, tt 
hardly seems controversial, but it did generate some confusion in Fourier’s time. 
In Sections 1.2 and 4.1, we alluded to the fact that in the early 1800s the term 
“function” was used to mean something more like “formula.” It was generally 
believed that a function’s behavior over the interval ( — tt, tt] determined its be- 
havior everywhere else, a point of view that follows naturally from an overly 
zealous faith in Taylor series. The modern definition of function given in Def- 
inition 1.2.3 is attributed to Dirichlet from the 1830s, although the idea had 
been suggested earlier by others. In Theorie Analytique de la Chaleur , Fourier 
clarifies his own use of the term by stating that a “function f(x) represents a 
succession of values or ordinates, each of which is arbitrary. . . We do not sup- 
pose these ordinates to be subject to a common law; they succeed each other in 
any matter whatever, and each of them is given as if it were a single quantity.” 

In the end, we will need to make a few assumptions about the nature of 
our functions, but the requirements we will need are quite mild, especially 
when compared with restrictions such as “infinitely differentiable,” which are 
necessary — but not sufficient — for the existence of a Taylor series representation. 
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Typ es of Convergence 

This brings us to a discussion of the word “expressed.” The assumptions we 
must ultimately place on our function depend on the kind of convergence we 
aim to demonstrate. How are we to understand the equal sign in equation (6)? 
Our usual course of action with infinite series is first to define the partial sum 


N 

(7) Sn(x) = ao + E a n cos (nx) + b n sin (nx). 

n — 1 

To “express f(x) as a trigonometric series” then means finding coefficients 
(a n )^E 0 an d (b n )??=i so that 




lim Sn(x). 

N^oo 


The question remains as to what kind of limit this is. Fourier probably imagined 
something akin to a pointwise limit because the concept of uniform convergence 
had not yet been formulated. In addition to pointwise convergence and uniform 
convergence, there are still other ways to interpret the limit in equation (8). 
Although it won’t be discussed here, it turns out that proving 



Sn(x) — f(x ) | 2 dx — 0 


is a natural way to understand equation (8) for a particular class of functions. 
This is referred to as L 2 convergence. An alternate type of convergence that we 
will discuss, called Cesaro mean convergence , relies on demonstrating that the 
averages of the partial sums converge, in our case uniformly, to f(x). 


Fourier Coefficients 


In the discussion that follows, we are going to need a few calculus facts. 

Exercise 8.5.2. Using trigonometric identities when necessary, verify the fol- 
lowing integrals. 

(a) For all n e N, 


/ 7T rir 

cos (nx)dx = 0 and / sin (nx)dx = 0 

-7T J — 7T 


(b) For all n e N, 


/ 7 r PIT 

cos 2 (nx)dx = 7 r and / sin 2 (nx)dx = n 

-7T J —IT 
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(c) For all m, n E N, 


* 7 r 


cos (m) sin(n:r)drr = 0 . 


— 7T 


For m 7 ^ n, 

/ 7T rTT 

cos(mx) cos(nx) dx = 0 and / sin(mx) sin(nx)dx = 0 . 

-7T J — 7T 


The consequences of these results are much more interesting than their 
proofs. The intuition from inner-product spaces is useful. Interpreting the 
integral as a kind of dot product, this exercise can be summarized by saying 
that the functions 


{1, cos(x), sin(x), cos(2x), sin(2x), cos(3x), . . . } 


are all orthogonal to each other. The content of what follows is that they in 
fact form a basis for a large class of functions. 

The first order of business is to deduce some reasonable candidates for the 
coefficients (a n ) and ( 6 n ) in equation ( 6 ). Given a function /(x), the trick is 
to assume we are in possession of a representation described in ( 6 ) and then 
manipulate this equation in a way that leads to formulas for (a n ) and ( 6 n ). 
This is exactly how we proceeded with Taylor series expansions in Section 6 . 6 . 
Taylor’s formula for the coefficients was produced by repeatedly differentiating 
each side of the desired representation equation. Here, we integrate. 

To compute ao, integrate each side of equation ( 6 ) from —i r to i r, brazenly 
take the integral inside the infinite sum, and use Exercise 8.5.2 to get 



Thus, 

(9) 


*7 r 

— 7T 

*7 r 


oo 


ao + E a n cos (nx) + b n sin (nx) 


n— 1 

OO />7T 


dx 


/ IX pli 

a^dx + / [a n cos (nx) + b n sin(nx)] dx 

-7T i J — 7T 


_7r n — 1 J n 

oo 

ao( 27r) + ^ a n 0 + b n 0 = a 0 (27r). 

n — 1 


1 


*7T 


ao 


27T 


f(x)dx. 


■7 r 


The switching of the sum and the integral sign in the second step of the previous 
calculation should rightly raise some eyebrows, but keep in mind that we are 
really working backward from a hypothetical representation for f(x) to get a 
proposal for what ao should be. The point is not to justify the derivation of the 
formula but rather to show that using this value for ao ultimately gives us the 
representation we want. That hard work lies ahead. 

Now, consider a fixed m > 1. To compute a m , we first multiply each side of 
equation ( 6 ) by cos (mx) and again integrate over the interval [— 7r,7r . 
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Exercise 8.5.3. Derive the formulas 


( 10 ) 


1 

U j rri — 


7 r 


/ TV f‘TV 

f(x) cos(mx)dx and b m = — f(x) sm(mx)dx 

-7T J — TV 


for all m > 1 . 

Let’s take a short break and empirically test our recipes for (a m ) and (b m ) 
on a few simple functions. 

Example 8.5.1. Let 

( 1 if 0 < £ < 7T 

f(x ) = < 0 if x = 0 or x = 7T 

I —1 if — 7 r < x < 0 . 


The fact that / is an odd function (i.e., f(—x) = —f(x)) means we can avoid 
doing any integrals for the moment and just appeal to a symmetry argument to 
conclude 


1 


do — 


2n 


/ TV 

f(x)dx = 0 and 

-TV 


1 

CL n 

7 r 


‘TV 


f(x) cos (nx)dx = 0 


— TV 


for all n > 1. We can also simplify the integral for b n by writing 


bn — 


1 


7 r 


‘TV 


‘TV 


f(x) sin (nx)dx 


— TV 


7T J 0 

2 (~ l 
7 t \ n 


sin (nx)dx 


cos (nx) 


TV 


0 


4 /n 7 r if n is odd 
0 if n is even. 



— 7T, 7T 


Figure 8.3: /, S 4 , AND ^20 ON 
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Proceeding on blind faith, we plug these results into equation (6) to get the 
representation 

4 oo ^ 

f(x) = — y sin((2 n + l)x). 

Jy J 7T ^ 2n + l 

n — 0 

A graph of a few of the partial sums of this series (Fig. 8.3) should generate 
some optimism about the legitimacy of what is happening. 


Exercise 8.5.4. (a) Referring to the previous example, explain why we can 

be sure that the convergence of the partial sums to f(x) is not uniform 
on any interval containing 0. 



x 


Repeat the computations of Example 8.5.1 for the function g(x) = 
and examine graphs for some partial sums. This time, make use of the 
fact that g is even (g(x) = g(—x)) to simplify the calculations. By just 
looking at the coefficients, how do we know this series converges uniformly 
to something? 


(c) Use graphs to collect some empirical evidence regarding the question of 
term-by-term differentiation in our two examples to this point. Is it pos- 
sible to conclude convergence or divergence of either differentiated series 
by looking at the resulting coefficients? Theorem 6.4.3 is about the legiti- 
macy of term-by-term differentiation. Can it be applied to either of these 
examples? 


The Riemann— Lebesgue Lemma 

In the examples we have seen to this point, the sequences of Fourier coefficients 
(a n ) and (b n ) all tend to 0 as n oo. This is always the case. Understanding 
why this happens is crucial to our upcoming convergence proof. 

We start with a simple observation. The reason 



sin (x)dx = 0 


is that the positive and negative portions of the sine curve cancel each other 
out. The same is true of 

pi r 


sin (nx)dx = 0. 


— 7 T 


Now, when n is large, the period of the oscillations of sin (nx) becomes very 
short — 2tt / n to be precise. If h(x) is a continuous function, then the values 
of h do not vary too much as sin (nx) ranges over each short period. The 
result is that the successive positive and negative oscillations of the product 
h(x) sin (nx) (Fig. 8.4) are nearly the same size so that the cancellation leads to 
a small value for 

P7T 

/ h(x) sm(nx) dx . 


— 7 r 
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Theorem 8.5.2 (Riemann— Lebesgue Lemma). Assume h(x) is continuous 
on (— 7r,7r]. Then, 



h{pc) sin (nx)dx — >• 0 


and 



h(x) cos (nx)dx — >• 0 


<25 n 


00 . 


Proof. Remember that, like all of our functions from here on, we are mentally 
extending h to be 27r-periodic. Thus, while our attention is generally focused 
on the interval ( — tt, tt] , the assumption of continuity is intended to mean that 
the periodically extended h is continuous on all of R. Note that in addition to 
continuity on ( — tt, tt] , this amounts to insisting that lim x ^_ 7T + h(x) = h{ tt). 

Exercise 8.5.5. Explain why h is uniformly continuous on R. 

Given e > 0, choose 5 > 0 such that \x — y\ <8 implies \h(x) — h(y)\ < e/2. The 
period of sin (nx) is 2tt/u, so choose N large enough so that tt/u < S whenever 
n > N. Now, consider a particular interval [a, b] of length 2n/n over which 
sin (nx) moves through one complete oscillation. 


Exercise 8.5.6. Show that 
complete the proof. 


f h(x)sin(nx)dx < e/n , and use this fact to 

□ 


Applications of Fourier series are not restricted to continuous functions (Ex- 
ample 8.5.1). Even though our particular proof makes use of continuity, the 
Riemann-Lebesgue lemma holds under much weaker hypotheses. It is true, 
however, that any proof of this fact ultimately takes advantage of the cancella- 
tion of positive and negative components. Recall from Chapter 2 that this type 
of cancellation is the mechanism that distinguishes conditional convergence from 
absolute convergence. In the end, what we discover is that, unlike power series, 
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Fourier series can converge conditionally. This makes them less robust, perhaps, 
but more versatile and capable of more interesting behavior. 


A Pointwise Convergence Proof 

Let’s return once more to Fourier’s claim that every “function” can be “ex- 
pressed” as a trigonometric series. Our recipe for the Fourier coefficients in 
equations (9) and (10) implicitly requires that our function be integrable. This 
is the major motivation for Riemann’s modification of Cauchy’s definition of 
the integral. Because integrability is a prerequisite for producing a Fourier se- 
ries, we would like the class of integrable functions to be as large as possible. 
The natural question to ask now is whether Riemann integrability is enough 
or whether we need to make some additional assumptions about / in order to 
guarantee that the Fourier series converges back to /. The answer depends on 
the type of convergence we hope to establish. 


oo 

f(x) = a 0 + E a n cos (nx) + b n sin (nx) 


pointwise convergence 
uniform convergence 
L 2 convergence 
Cesaro mean convergence 


bounded 
integrable 
continuous 
differentiable 
f continuous 


There is no tidy way to summarize the situation. For pointwise convergence, 
integrability is not enough. At present, “integrable” for us means Riemann- 
integrable, which we have only rigorously defined for bounded functions. In 
1966, Lennart Carleson proved (via an extremely complicated argument) that 
the Fourier series for such a function converges pointwise at every point in 
the domain excluding possibly a set of measure zero. This term surfaced in our 
discussion of the Cantor set (Section 3.1) and is defined rigorously in Section 7.6. 
Sets of measure zero are small in one sense, but they can be uncountable, and 
there are examples of continuous functions with Fourier series that diverge at 
uncountably many points. Lebesgue’s modification of Riemann’s integral in 
1901 proved to be a much more natural setting for Fourier analysis. Carleson’s 
proof is really about Lebesgue-integrable functions which are allowed to be 
unbounded but for which |/| 2 is finite. One of the cleanest theorems in 
this area states that, for this class of square Lebesgue-integrable functions, the 
Fourier series always converges to the function from which it was derived if 
we interpret convergence in the L 2 sense described earlier. As a final warning 
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about how fragile the situation is, there is an example due to A. Kolmogorov 
(1903-1987) of a Lebesgue-integrable function where the Fourier series fails to 
converge at any point. 

Although all of these results require significantly more background to pursue 
in any rigorous way, we are in a position to prove some important theorems that 
require a few extra assumptions about the function in question. We will content 
ourselves with two interesting results in this area. 

Theorem 8.5.3. Let f(x) be continuous on (— 7r,7r] ; and let Sn(x) be the Nth 
partial sum of the Fourier series described in equation (7), where the coefficients 
(a n ) and (b n ) are given by equations (9) and (10). It follows that 

lim Sn{%) = f(x) 

iV— >- oo 


pointwise at any x E ( — tt, tt] where f'(x) exists. 


Proof. Cataloging a few preliminary facts makes for a smoother argument. 


Fact 1 : (a) cos(o — 6) = cos(o) cos(0) + sin(a) sin(0). 
(b) sin(<a + 0) = sin(a) cos(0) + cos(o) sin(0). 

Fact 2 : \ + cos(0) + cos(20) + cos(30) + • • • + cos (NO) 
any 6 f 2 utt. 


sin ((N + 1/2)0 , 

2sin(6»/2) 


Facts 1(a) and 1(b) are familiar trigonometric identities. Fact 2 is not as 
familiar. Its proof (which we omit) is most easily derived by taking the real part 
of a geometric sum of complex exponentials. The function in Fact 2 is called the 
Dirichlet kernel in honor of the mathematician responsible for the first rigorous 
convergence proof of this kind. Integrating both sides of this identity leads to 
our next important fact. 


Fact 3 : Setting 


d n (0) 


from Fact 2, we see that 


sin((7V+l/2)6>) 
2 sin(0/2) 

1/2 + TV, 


if Of 2nir 
if 6 = 2mr 



D N (0)d0 


= 7 r. 


Although we will not restate it, the last fact we will use is the Riemann- 
Lebesgue Lemma. 

Fix a point x E ( — tt, tt] . The first step is to simplify the expression for 
Sjsr(x). Now x is a fixed constant at the moment, so we will write the integrals 
in equations (9) and (10) using t as the variable of integration. Keeping an eye 
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on Facts 1(a) and (2), we get that 


N 


Sn(x) = ap + a n cos (nx) + b n sin (nx) 


n = 1 

l 


2tt 


f(t)dt 


— 7T 


N r 


+ E 


1 


*7T 


n = 1 L 
7 V r 


7T 


/(£) cos (nt)dt 


— 7 r 


cos(n:r) 


+E 


1 

7T 

1 

7T 

1 

7 r 


*7T 

-7T 

*7T 

-7T 

*7T 

-7T 


/W 


m 


1 


n=l L 
N 


1 


7T 


*7T 


/(t) sin(nt)dt 


— 7 r 


sin(nx) 


+ cos(nt) cos (nx) + sin(nt) sin(nx) 


1 


n=l 

N 


dt 


— h cos (nt — nx) 


n= 1 

f(t)Djsr(t — x)dt. 


dt 


As one final simplification, let u = t — x. Then, 




i 

7T 



f(u + x)D]y(u)du 


1 

7T 



f(u + x)D^{u)du. 


The last equality is a result of our agreement to extend / to be 27r-periodic. 
Because Dn is also periodic (it is the sum of cosine functions), it does not 
matter over what interval we compute the integral as long as we cover exactly 
one full period. 

To prove Sn(x) /(#), we must show that |5 /v0e) — f(x) \ gets arbitrarily 
small when N gets large. Having expressed Sn(x) as an integral involving 
Dn(u), we are motivated to do a similar thing for f(x). By Fact 3, 


f(x) = f(x)~ [ D N (u)du 

^ J — 7T 


1 

7 T 


*7 r 


f(x)DN(u)du , 


— 7 r 


and it follows that 

(11) S N (x) - f(x) = - f (f(u + x) - f(x))D N (u)du. 

K j — 7T 

Our goal is to show this quantity tends to zero as TV oo. A sketch of 
Dn{u) (Fig. 8.5) for a few values of N reveals why this might happen. For large 
TV, the Dirichlet kernel D^{u) has a tall, thin spike around u = 0, but this is 
precisely where f(u + x) — f(x) is small (because / is continuous). Away from 
zero, Dn(u) exhibits the fast oscillations that hearken back to the Riemann- 
Lebesgue Lemma (Theorem 8.5.2). Let’s see how to use this theorem to finish 
the argument. 
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Using Fact 1(b), we can rewrite the Dirichlet kernel as 


Dn(u) — 


sin((7V + 1/2)^) 1 

2sin('u/2) 2 


sin (Nu) cos(ia/ 2) 
sin(u/2) 


+ cos(j Nu) 


Then, equation (11) becomes 


Sn(x) ~ f(x) = 


1 

27T 

1 

27T 


1 

27 r 


*7 r 

■ 7T 
*7T 

■ 7T 

‘7T 

■ 7T 


(/( M + x ) ~ /0)) 


(. f(u + x ) - /(a;)) 


sin(TVn) cos(ia/2) 
sin(iz/2) 

sin(Abi) cos(iz/2) 


+ cos (Nu) 


du 


p x (u) sin (Nu)du + 


sin(i//2) 

+ (/(u + x) — /(#)) cos (Nu)du 

1 


27T 


q x (u ) cos (Nu)du, 


■7 r 


where in the last step we have set 

/ X (f(u + x) — f(x))cOs(u/2) , / X x /./ X 

Pa:(w) = sin(-»/2) and Qx W = ^ U + X - ^ X ' 

Exercise 8.5.7. (a) First, argue why the integral involving q x (u) tends to 

zero as TV oo. 


(b) The first integral is a little more subtle because the function p x (u ) has the 
sin(iz/2) term in the denominator. Use the fact that / is differentiable at 
x (and a familiar limit from calculus) to prove that the first integral goes 
to zero as well. □ 


This completes the argument that Sn(x) — > f(x) at any point x where 
/ is differentiable. If the derivative exists everywhere, then we get Sn f 
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pointwise. If we add the assumption that f is continuous, then it is not too 
difficult to show that the convergence is uniform. In fact, there is a very strong 
relationship between the speed of convergence of the Fourier series and the 
smoothness of /. The more derivatives / possesses, the faster the partial sums 
Sn converge to /. 


Cesaro Mean Convergence 

Rather than pursue the proofs in this interesting direction, we will finish this 
very brief introduction to Fourier series with a look at a different type of con- 
vergence called Cesaro mean convergence. 

Exercise 8.5.8. Prove that if a sequence of real numbers (x n ) converges, then 
the arithmetic means 

X\ H~ X 2 T" x% + • • • + x n 


also converge to the same limit. Give an example to show that it is possible for 
the sequence of means ( y n ) to converge even if the original sequence ( x n ) does 
not. 


The discussion preceding Theorem 8.5.3 is intended to create a kind of rev- 
erence for the difficulties inherent in deciphering the behavior of Fourier series, 
especially in the case where the function in question is not differentiable. It is 
from this humble frame of mind that the following elegant result due to L. Fejer 
in 1904 can best be appreciated. 


Theorem 8.5.4 (Fejer’s Theorem). Let S n (x) be the nth partial sum of the 
Fourier series for a function f on (—7 r, tt] . Define 


a N (x) 


1 

tvTT 


N 

n — 0 


If f is continuous on (— 7r,7r], then ctn(x) f(x) uniformly. 


Proof. This argument is patterned after the proof of Theorem 8.5.3 but is ac- 
tually much simpler. In addition to the trigonometric formulas listed in Facts 
1 and 2, we are going to need a version of Fact 2 for the sine function, which 
looks like 


sin (9) + sin(20) + sin(3$) + • • • + sin (NO) 


sin (^) sin ((TV + 1)|) 
sin (f) 


Exercise 8.5.9. Use the previous identity to show that 


1/2 + D^O) + D 2 {9) + • • • + D n {0) 


1 


sin((7V + l)f) 
sin (|) 


N + l 


2{N+l) 
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The expression in Exercise 8.5.9 is called the Fejer kernel and will be de- 
noted by Fn(0). Analogous to the Dirichlet kernel D^(0) from the proof of 
Theorem 8.5.3, Fjy is used to greatly simplify the formula for cfn{x). 

Exercise 8.5.10. (a) Show that 

i r 

ctn{x) = — / f(u + x)Fn(u) du. 

^ -J — TV 


(b) Graph the function Fjy(u) for several values of N. Where is F^ large, 
and where is it close to zero? Compare this function to the Dirichlet 
kernel Dn(u). Now, prove that Fn 0 uniformly on any set of the form 
{u : \u\ > (5}, where 5 > 0 is fixed (and u is restricted to the interval 

( 7T, 7r] ) . 


(c) Prove that fff Fn(u) du = it. 

(d) To finish the proof of Fejer’s Theorem, first choose a 5 > 0 so that 


u 


< 5 implies | f(x H - u) — f(x)\ < e 


Set up a single integral that represents the difference a^(x ) — f(x) and 
divide this integral into sets where \u\ < S and \u\ > S. Explain why it is 
possible to make each of these integrals sufficiently small, independently 
of the choice of x. □ 


Weierstrass Approximation Theorem 

The hard work of proving Fejer’s Theorem has many rewards, one of which 
is access to a relatively short argument for a profoundly important theorem 
discovered by Weierstrass in 1885. The Weierstrass Approximation Theorem 
(WAT) is studied in depth in Section 6.7 and is restated here for ease of reference. 

Theorem 6.7.1 (Weierstrass Approximation Theorem). Let f : [a, b\ -V 

R be continuous. Given e > 0, there exists a polynomial p(x) satisfying 

I f(x) -p(x) I < e 


for all x G [a, b] 


Proof. We have actually seen a few special cases of this result before in Sec- 
tion 6.6 on Taylor series. For instance, we showed that 


q c; 7 q 

rY *^- 9 ry* ' ry **- 9 

v *Aj T T *Xj 

X) = X rH 7 rH 7 — 

3! 5! 7! 9! 


where the series converges uniformly on any bounded subset of R. Uniform 
convergence of a series means the partial sums converge uniformly, and the 
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partial sums in this case are polynomials. Notice that this is precisely what 
WAT asks us to prove, only we must do it for an arbitrary, continuous function 
in place of sin(x). 

Using Taylor series does not work in general. To construct a Taylor series 
we need the function to be infinitely differentiable — not just continuous — and 
even in this case we might get a series that either does not converge or converges 
to the wrong thing. Taylor series are a valuable tool, however. In Section 6.7 
we used the Taylor series for y/l — x as the starting point for a proper proof 
of WAT. Fejer’s Theorem, in conjunction with the Taylor series for sin(x) and 
cos(x), provides a significant shortcut to the same result. 

Exercise 8.5.11. (a) Use the fact that the Taylor series for sin(x) and cos(x) 

converge uniformly on any compact set to prove WAT under the added 
assumption that [a, b] is [0, tt] . 


(b) Show how the case for an arbitrary interval [a, b] follows from this one. 


□ 


A comment from Section 6.7 that bears repeating relates to the striking 
contrast between this result and Weierstrass’s demonstration of a continuous 
nowhere-differentiable function. Although there exist continuous functions that 
oscillate so wildly that they fail to have a derivative at any point, these unruly 
functions are always uniformly within e of an infinitely differentiable polynomial. 


Approximation as a Unifying Theme 

Viewing the last section of this chapter as a kind of appendix (included to 
clear up some loose ends from Chapter 1 regarding the definition of the real 
numbers), the Weierstrass’ Approximation Theorem makes for a fitting close to 
our introductory survey of some of the gems of analysis. 

The idea of approximation permeates the entire subject. Every real num- 
ber can be approximated with rational ones. The value of an infinite sum is 
approximated with partial sums, and the value of a continuous function can 
be approximated with its values nearby. A function is differentiable when a 
straight line is a good approximation to the curve, and it is integrable when 
finite sums of rectangles are a good approximation to the area under the curve. 
Now, we learn that every continuous function can be approximated arbitrarily 
well with a polynomial. In every case, the approximating objects are tangi- 
ble and well-understood, and the issue is how well these properties survive the 
limiting process. By viewing the different infinities of mathematics through 
pathways crafted out of finite objects, Weierstrass and the other founders of 
analysis created a paradigm for how to extend the scope of mathematical explo- 
ration deep into territory previously unattainable. Although our journey ends 
here, the road is long and continues to be written. 


8.6. A Construction of R From Q 


297 


8.6 A Construction of R From Q 

This entire section is devoted to constructing a proof for the following theorem: 

Theorem 8.6.1 (Existence of the Real Numbers). There exists an ordered 
field in which every nonempty set that is bounded above has a least upper bound. 
In addition, this field contains Q as a subfield. 

There are a few terms to define before this statement can be properly under- 
stood and proved, but it can essentially be paraphrased as “the real numbers 
exist.” In Section 1.1, we encountered a major failing of the rational number 
system as a place to do analysis. Without the square root of 2 (and uncount- 
ably many other irrational numbers) we cannot confidently move from a Cauchy 
sequence to its limit because in Q there is no guarantee that such a number ex- 
ists. (A review of Sections 1.1 and 1.3 is highly recommended at this point.) 
The resolution we proposed in Chapter 1 came in the form of the Axiom of 
Completeness, which we restate. 

Axiom of Completeness. Every nonempty set of real numbers that is bounded 
above has a least upper bound. 

Now let’s be clear about how we actually proceeded in Chapter 1. This is 
the property that distinguishes Q from R, but by referring to this property as 
an axiom we were making the point that it was not something to be proved. 
The real numbers were defined simply as an extension of the rational numbers 
in which bounded sets have least upper bounds, but no attempt was made to 
demonstrate that such an extension is actually possible. Now, the time has 
finally come. By explicitly building the real numbers from the rational ones, we 
will be able to demonstrate that the Axiom of Completeness does not need to 
be an axiom at all; it is a theorem! 

There is something ironic about having the final section of this book be 
a construction of the number system that has been the underlying subject of 
every preceding page, but there is something perfectly apt about it as well. 
Through eight chapters stretching from Cantor’s Theorem to the Baire Category 
Theorem, we have come to see how profoundly the addition of completeness 
changes the landscape. We all grow up believing in the existence of real numbers, 
but it is only through a study of classical analysis that we become aware of their 
elusive and enigmatic nature. It is because completeness matters so much, and 
because it is responsible for such perplexing phenomena, that we should now 
feel obliged — compelled really — to go back to the beginning and verify that such 
a thing really exists. 

As we mentioned in Chapter 1, proceeding in this order puts us in good 
historical company. The pioneering work of Cauchy, Bolzano, Abel, Dirichlet, 
Weiestrass, and Riemann preceded — and in a very real sense led to — the host 
of rigorous definitions for R that were proposed in the last half of the 19th 
century. Georg Cantor is a familiar name responsible for one of these definitions, 
but alternate constructions of the real number system also came from Charles 
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Meray (1835-1911), Heinrich Heine (1821-1881), and Richard Dedekind (1831— 
1916). The formulation that follows is the one due to Dedekind. In a sense it 
is the most abstract of the approaches, but it is the most appropriate for us 
because the verification of completeness is done in terms of least upper bounds. 

Dedekind Cuts 

We begin this discussion by assuming that the rational numbers and all of the 
familiar properties of addition, multiplication, and order are available to us. At 
the moment, there is no such thing as a real number. 

Definition 8.6.2. A subset A of the rational numbers is called a cut if it 
possesses the following three properties: 

(cl) A 7 ^ 0 and i ^ Q. 

(c2) If r G A, then A also contains every rational q < r. 

(c3) A does not have a maximum; that is, if r E A, then there exists s £ A 
with r < s. 

Exercise 8.6.1. (a) Fix r G Q. Show that the set C r = {t G Q : t < r} is a 

cut. 

The temptation to think of all cuts as being of this form should be avoided. 
Which of the following subsets of Q are cuts? 

(b) S = {t G Q : t < 2 } 

(c) T = {t G Q : t 2 < 2 or t < 0} 

(d) U = {t G Q : t 2 < 2 or t < 0} 

Exercise 8 . 6 . 2 . Let A be a cut. Show that if r E A and s ^ A, then r < s. 

To dispel any suspense, let’s get right to the point. 

Definition 8.6.3. Define the real numbers R to be the set of all cuts in Q. 

This may feel awkward at first — real numbers should be numbers, not sets 
of rational numbers. The counterargument here is that when working on the 
foundations of mathematics, sets are about the most basic building blocks we 
have. We have defined a set R whose elements are subsets of Q. We now must 
set about the task of imposing some algebraic structure on R that behaves in 
a way familiar to us. What exactly does this entail? If we are serious about 
constructing a proof for Theorem 8.6.1, we need to be more specific about what 
we mean by an “ordered field.” 
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Field and Order Properties 

Given a set F and two elements x,y G F, an operation on F is a function that 
takes the ordered pair (x, y ) to a third element z <E F. Writing x + y or xy 
to represent different operations reminds us of the two operations that we are 
trying to emulate. 

Definition 8.6.4. A set F is a field if there exist two operations — addition 
(x + y) and multiplication {xy) — that satisfy the following list of conditions: 

(fl) (commutativity) x + y = y + x and xy = yx for all x,y F. 

(f2) (associativity) ( x+y) + z = x + {y + z) and (xy)z = x{yz) for all x,y,z G F. 

(f3) (identities exist) There exist two special elements 0 and 1 with 0^1 such 

that x + 0 = x and xl = x for all x G F. 

(f4) (inverses exist) Given x G F, there exists an element — x G F such that 

x + (~x) =0. If x 7 ^ 0, there exists an element x~ x such that xx~ x = 1. 

(f5) (distributive property) x{y + z) = xy + xz for all x,y,z G F. 

Exercise 8.6.3. Using the usual definitions of addition and multiplication, 
determine which of these properties are possessed by N, Z, and Q, respectively. 

Although we will not pursue this here in any depth, all of the familiar al- 
gebraic manipulations in Q (e.g., x + y = x + z implies y = z) can be derived 
from this short list of properties. 

Definition 8.6.5. An ordering on a set F is a relation, represented by <, with 
the following three properties: 

(01) For arbitrary x,y G F, at least one of the statements x < y or y < x is 
true. 

(0 2 ) If x < y and y < x, then x = y. 

(03) If x < y and y < z, then x < z. 

We will sometimes write y > x in place of x < y. The strict inequality x < y 
is used to mean x < y but x yf y. 

A field F is called an ordered field if F is endowed with an ordering < that 
satisfies 

(04) If y < z, then x + y < x + z. 

(05) If x > 0 and y > 0, then xy > 0. 

Let’s take stock of where we are. To prove Theorem 8.6.1, we are accepting 
as given that the rational numbers are an ordered field. We have defined the real 
numbers R to be the collection of cuts in Q, and the challenge now is to invent 
addition, multiplication, and an ordering so that each possesses the properties 
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outlined in the preceding two definitions. The easiest of these is the ordering. 
Let A and B be two arbitrary elements of R. 

Define A < B to mean A C B. 

Exercise 8.6.4. Show that this defines an ordering on R by verifying properties 
(ol), (o2), and (o3) from Definition 8.6.5. 

Algebra in R 

Given A and B in R, define 

A + B = {a + b:a£A and b £ B}. 

Before checking properties (fl)-(f4) for addition, we must first verify that our 
definition really defines an operation. Is A T B actually a cut? To get the flavor 
of how these arguments look, let’s verify property (c2) of Definition 8.6.2 for 
the set A T B. 

Let a + b £ A T B be arbitrary and let s £ Q satisfy s < a T b. Then, 
s — b < a, which implies that s — b £ A because A is a cut. But then 

s = (s — b) T b £ A T B, 


and (c2) is proved. 

Exercise 8.6.5. (a) Show that (cl) and (c3) also hold for A T B. Conclude 

that A-[- B is a cut. 

(b) Check that addition in R is commutative (fl) and associative (f2). 

(c) Show that property (o4) holds. 

(d) Show that the cut 

O = {p £ Q : p < 0} 

successfully plays the role of the additive identity (f3). (Showing A TO = 
A amounts to proving that these two sets are the same. The standard 
way to prove such a thing is to show two inclusions: A T O C A and 
A C A TO.) 

What about additive inverses? Given A £ R, we must produce a cut —A 
with the property that A T (— A ) = O. This is a bit more difficult than it 
sounds. Conceptually, the cut —A consists of all rational numbers less than 
— sup A. The problem is how to define this set without using suprema, which 
are strictly off limits at the moment. (We are building the field in which they 
exist!) 

Given A £ R, define 

—A = {r £ Q : there exists t £ A with t < — r}. 
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Exercise 8.6.6. (a) Prove that —A defines a cut. 

(b) What goes wrong if we set —A = {r E Q : — r ^ A }? 

(c) If a G A and r E —A, show a + r E O. This shows A + (—A) C O. Now, 
finish the proof of property (f4) for addition in Definition 8.6.4. 

Although the ideas are similar, the technical difficulties increase when we 
try to create a definition for multiplication in R. This is largely due to the fact 
that the product of two negative numbers is positive. The standard method of 
attack is first to define multiplication on the non-negative cuts. 

Given A > O and B > O in R, define the product 


AB = {ab : a E A, b E B with a, b > 0} U {q G Q : q < 0}. 


Exercise 8.6.7. (a) Show that AB is a cut and that property (o5) holds. 

(b) Propose a good candidate for the multiplicative identity (1) on R and 
show that this works for all cuts A > O. 


(c) Show the distributive property (f5) holds for non-negative cuts. 


Products involving at least one negative factor can be defined in terms of the 
product of two positive cuts by observing that —A > 0 whenever A < O. (Given 
A < O, property (o4) implies A + (—A) < O + (—A), which yields O < —A.) 
For any A and B in R, define 


AB 


as given 

-[M-B)] 
-[( -A)B] 


if A > O and B > O 
if A > O and B < O 
if A < O and B > O 
if A < O and B < O. 


Verifying that multiplication defined in this way satisfies all the required field 
properties is important but uneventful. The proofs generally fall into cases for 
when terms are positive or negative and follow a pattern similar to those for 
addition. We will leave them as an unofficial exercise and move on to the punch 
line. 


Least Upper Bounds 

Having proved that R is an ordered field, we now set our sights on showing 
that this field is complete. We defined completeness in Chapter 1 in terms of 
least upper bounds. Here is a summary of the relevant definitions from that 
discussion. 
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Definition 8.6.6. A set iCRis bounded above if there exists a B G R such 
that A < B for all A G A. The number B is called an upper bound for A. 

A real number S' G R is the least upper bound for a set A C R if it meets 
the following two criteria: 

(i) S is an upper bound for A and 

(ii) if B is any upper bound for A, then S < B. 

Exercise 8.6.8. Let A C R be nonempty and bounded above, and let S be 
the union of all A G A. 

(a) First, prove that S G R by showing that it is a cut. 

(b) Now, show that S is the least upper bound for A. 

This finishes the proof that R is complete. Notice that we could have proved 
that least upper bounds exist immediately after defining the ordering on R, but 
saving it for last gives it the privileged place in the argument it deserves. There 
is, however, still one loose end to sew up. The statement of Theorem 8.6.1 
mentions that our complete ordered field contains Q as a subfield. This is a 
slight abuse of language. What it should say is that R contains a subfield that 
looks and acts exactly like Q. 

Exercise 8.6.9. Consider the collection of so-called “rational” cuts of the form 

C r = {t G Q : t < rj 
where r G Q. (See Exercise 8.6.1.) 

(a) Show that C r + C s = CV+ S for all r, s G Q. Verify C r C s = C rs for the 
case when r, s > 0. 

(b) Show that C r < C s if and only if r < s in Q. 

Cantor’s Approach 

As a way of giving Georg Cantor the last word, let’s briefly look at his very 
different approach to constructing R out of Q. One of the many equivalent 
ways to characterize completeness is with the assertion that “Cauchy sequences 
converge.” Given a Cauchy sequence of rational numbers, we are now well aware 
that this sequence may converge to a value not in Q. Just as before, the goal is 
to create something, which we will call a real number , that can serve as the limit 
of this sequence. Cantor’s idea was essentially to define a real number to be the 
entire Cauchy sequence. The first problem one encounters with this approach 
is the realization that two different Cauchy sequences can converge to the same 
real number. For this reason, the elements in R are more appropriately defined 
as equivalence classes of Cauchy sequences where two sequences (x n ) and (y n ) 
are in the same equivalence class if and only if (x n — y n ) —> 0. 
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As with Dedekind’s approach, it can be momentarily disorienting to sup- 
plant our relatively simple notion of a real number as a decimal expansion with 
something as unruly as an equivalence class of Cauchy sequences. But what 
exactly do we mean by a decimal expansion? And how are we to understand 
the number 1/2 as both .5000. . . and .4999. . .? We leave it as an exercise. 
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